Have the spark
folder in a directory of your choice.
For my master it was home/lieu/dev/spark
and for my slaves it was /home/pirate/spark
.
Do the following export on master because the slaves and the master have their spark folder in different directories (we'll make use of this later)
export $SLAVE_SPARK_HOME=/home/pirate/spark/spark-2.4.4-bin-without-hadoop
On the slaves, one has to install openjdk-8-jre
and unzip
Folder structure should look like this:
ll /home/lieu/dev/spark
drwxr-xr-x 5 lieu lieu 4096 Sep 5 14:38 ./
drwxr-xr-x 15 lieu lieu 4096 Sep 5 14:38 ../
drwxr-xr-x 2 lieu lieu 4096 Sep 5 12:57 bin/
-rw-r--r-- 1 lieu lieu 6148 Sep 5 13:00 .DS_Store
-rwxr-xr-x 1 lieu lieu 300 Sep 5 13:52 env.sh*
drwxr-xr-x 9 lieu lieu 4096 Jan 29 2019 hadoop-3.1.2/
drwxr-xr-x 15 lieu lieu 4096 Sep 5 14:36 spark-2.4.4-bin-without-hadoop/
Check the Hadoop core-site.xml looks like this
hadoop-3.1.2/etc/hadoop/core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.s3a.endpoint</name>
<description>AWS S3 endpoint to connect to. An up-to-date list is
provided in the AWS Documentation: regions and endpoints. Without this
property, the standard region (s3.amazonaws.com) is assumed.
</description>
<value>http://192.168.72.156:9000</value>
</property>
<property>
<name>fs.s3a.access.key</name>
<description>AWS access key ID.</description>
<value>N0262R8RT8...</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<description>AWS secret key.</description>
<value>tOdQZa6tMCSGPE/1aVK8Sn6...</value>
</property>
<property>
<name>fs.s3a.path.style.access</name>
<value>true</value>
<description>Enable S3 path style access ie disabling the default virtual hosting behaviour.
Useful for S3A-compliant storage providers as it removes the need to set up DNS for virtual hosting.
</description>
</property>
<property>
<name>fs.s3a.impl</name>
<value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
<description>The implementation class of the S3A Filesystem</description>
</property>
</configuration>
spark-2.4.4-bin-without-hadoop/run.sh
#!/bin/bash
export DD_HOME=/home/lieu/dev/spark
export SPARK_HOME=$DD_HOME/spark-2.4.4-bin-without-hadoop
export PATH=$PATH:$SPARK_HOME/bin
export HADOOP_HOME=$DD_HOME/hadoop-3.1.2
export PATH=$PATH:$HADOOP_HOME/bin
export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native
export SPARK_DIST_CLASSPATH=$(hadoop classpath)
export TERM=xterm-color
./bin/spark-shell --master spark://192.168.10.66:7077 --jars ../bin/slf4j-api-1.7.25.jar,../bin/slf4j-log4j12-1.7.25.jar,../bin/aws-java-sdk-1.11.624.jar,../bin/aws-java-sdk-core-1.11.624.jar,../bin/aws-java-sdk-dynamodb-1.11.624.jar,../bin/aws-java-sdk-kms-1.11.624.jar,../bin/aws-java-sdk-s3-1.11.624.jar,../bin/hadoop-aws-3.1.2.jar,../bin/httpclient-4.5.9.jar,../bin/joda-time-2.10.3.jar
#./bin/spark-shell --master local[4] --jars $(echo ../bin/*.jar | tr ' ' ',')
conf/slaves
pirate@192.168.10.83
conf/spark-env.sh
SPARK_MASTER_HOST=192.168.10.66
SPARK_LOCAL_IP=192.168.10.66
Additionally, we need to edit the following scripts in /sbin/
, because the shell script assumes that the
slaves' $SPARK_HOME is the same as the master's.
sbin/start-slaves.sh
Replace the last line with the following:
# Launch the slaves
"${SPARK_HOME}/sbin/slaves.sh" cd "${SLAVE_SPARK_HOME}" \; "${SLAVE_SPARK_HOME}/sbin/start-slave.sh" "spark://$SPARK_MASTER_HOST:$SPARK_MASTER_PORT"
sbin/stop-slaves.sh
Replace the last line with the following:
"${SPARK_HOME}/sbin/slaves.sh" cd "${SLAVE_SPARK_HOME}" \; "${SLAVE_SPARK_HOME}/sbin"/stop-slave.sh
conf/spark-env.sh
export DD_HOME=/home/pirate/spark
export HADOOP_HOME=$DD_HOME/hadoop-3.1.2
export SPARK_DIST_CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath)
SPARK_MASTER_HOST=192.168.10.66