install.spark {SparkR} | R Documentation |
install.spark
downloads and installs Spark to a local directory if
it is not found. If SPARK_HOME is set in the environment, and that directory is found, that is
returned. The Spark version we use is the same as the SparkR version. Users can specify a desired
Hadoop version, the remote mirror site, and the directory where the package is installed locally.
install.spark(
hadoopVersion = "2.7",
mirrorUrl = NULL,
localDir = NULL,
overwrite = FALSE
)
hadoopVersion |
Version of Hadoop to install. Default is |
mirrorUrl |
base URL of the repositories to use. The directory layout should follow Apache mirrors. |
localDir |
a local directory where Spark is installed. The directory contains version-specific folders of Spark packages. Default is path to the cache directory:
|
overwrite |
If |
The full url of remote file is inferred from mirrorUrl
and hadoopVersion
.
mirrorUrl
specifies the remote path to a Spark folder. It is followed by a subfolder
named after the Spark version (that corresponds to SparkR), and then the tar filename.
The filename is composed of four parts, i.e. [Spark version]-bin-[Hadoop version].tgz.
For example, the full path for a Spark 2.0.0 package for Hadoop 2.7 from
http://apache.osuosl.org
has path:
http://apache.osuosl.org/spark/spark-2.0.0/spark-2.0.0-bin-hadoop2.7.tgz
.
For hadoopVersion = "without"
, [Hadoop version] in the filename is then
without-hadoop
.
the (invisible) local directory where Spark is found or installed
install.spark since 2.1.0
See available Hadoop versions: Apache Spark
## Not run:
##D install.spark()
## End(Not run)