Skip to content

Instantly share code, notes, and snippets.

@trungnt13
Last active August 29, 2015 14:25
Show Gist options
  • Save trungnt13/f1b1ca1c306664ec52af to your computer and use it in GitHub Desktop.
Save trungnt13/f1b1ca1c306664ec52af to your computer and use it in GitHub Desktop.
Setup Spark + Hadoop cluster
# Download: prebuilt spark+hadoop
From: https://spark.apache.org/downloads.html
Prefer prebuilt version, easier to deploy to all worker
# Configuration
1. Slaves configuration
* Client:
sudo adduser worker
sudo /usr/sbin/visudo (add sudo Privileges for worker)
ssh worker@192.168.1.15
mkdir .ssh
nano .ssh/authorized_keys (copy id_rsa.pub to this file)
copy spark to /home/worker/spark
* Server:
cd $SPARK_HOME/conf
cp slaves.template slaves
add 1 line for each host, e.g. worker@192.168.1.15
2. Spark environment
# Start Cluster
cd $SPARK_HOME/sbin
./start-all.sh
# Setup pyspark + ipython
Follow this instruction: http://ramhiser.com/2015/02/01/configuring-ipython-notebook-support-for-pyspark/
However, replace following line or you will never get it done:
```export PYSPARK_SUBMIT_ARGS="--master local[2] pyspark-shell"```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment