Last active
August 29, 2015 14:25
-
-
Save trungnt13/f1b1ca1c306664ec52af to your computer and use it in GitHub Desktop.
Setup Spark + Hadoop cluster
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Download: prebuilt spark+hadoop | |
From: https://spark.apache.org/downloads.html | |
Prefer prebuilt version, easier to deploy to all worker | |
# Configuration | |
1. Slaves configuration | |
* Client: | |
sudo adduser worker | |
sudo /usr/sbin/visudo (add sudo Privileges for worker) | |
ssh worker@192.168.1.15 | |
mkdir .ssh | |
nano .ssh/authorized_keys (copy id_rsa.pub to this file) | |
copy spark to /home/worker/spark | |
* Server: | |
cd $SPARK_HOME/conf | |
cp slaves.template slaves | |
add 1 line for each host, e.g. worker@192.168.1.15 | |
2. Spark environment | |
# Start Cluster | |
cd $SPARK_HOME/sbin | |
./start-all.sh | |
# Setup pyspark + ipython | |
Follow this instruction: http://ramhiser.com/2015/02/01/configuring-ipython-notebook-support-for-pyspark/ | |
However, replace following line or you will never get it done: | |
```export PYSPARK_SUBMIT_ARGS="--master local[2] pyspark-shell"``` |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment