Last active
August 29, 2015 14:16
-
-
Save samklr/75486c2d9e31c5998443 to your computer and use it in GitHub Desktop.
Spark Standalone Cluster
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--------Spark Standalone--------- | |
Prerequistes : | |
* Set JAVA_HOME env variable | |
* Configure ssh so master and workers can talk without pw | |
- $ ssh-keygen -- enter | |
- Copy the SSH Public Key (id_rsa.pub) to the root account on your target hosts. | |
.ssh/id_rsa .ssh/id_rsa.pub | |
- Add the SSH Public Key to the authorized_keys file on your target hosts. | |
$ cat id_rsa.pub >> authorized_keys | |
* disable iptables | |
- run $ /etc/init.d/iptables stop | |
- run $ chkconfig iptables off | |
* disable selinux | |
- run $ setenforce 0 | |
- run $ vim /etc/sysconfig/selinux | |
change SELINUX = DISABLED | |
- sestatus - check for selinux status | |
* add hostname to /etc/hosts | |
- run $ vim /etc/hosts | |
add host ip hostname | |
Installation : | |
* Download the spark in /opt | |
- $ cd /opt | |
- $ wget http://d3kbcqa49mib13.cloudfront.net/spark-1.3.0-bin-hadoop2.4.tgz | |
or http://apache.mirrors.ovh.net/ftp.apache.org/dist/spark/spark-1.3.0/spark-1.3.0-bin-hadoop2.4.tgz | |
* Untar the folder | |
- $ tar -xzvf spark-1.2.1-bin-hadoop2.4.tgz | |
* Create a symbolic link | |
- $ ln -s spark-1.2.1-bin-hadoop2.4.tgz spark | |
* Edit the conf/slaves file on your master and fill in the workers’ hostnames. | |
* run sbin/start-all.sh on your master (it is important to run it there rather than on a worker). If everything started, you should get no prompts for a password, and the cluster manager’s web UI should appear at http://masternode:8080 and should show all your workers. | |
* run sbin/stop-all.sh to stop the cluster | |
OR | |
* deploy master | |
bin/spark-class org.apache.spark.deploy.master.Master | |
* deploy workers | |
bin/spark-class org.apache.spark.deploy.worker.Worker spark://[masternode[FQDN]]:7077 | |
If all is good to go | |
Run sample jobs | |
- run $ ./bin/spark-shell --master spark://[masternode[FQDN]]:7077 | |
You can also pass an option --total-executor-cores <numCores> to control the number of cores that spark-shell uses on the cluster. | |
Submitting Application | |
* Run on a Spark Standalone cluster in client deploy mode | |
* Running Pi program | |
- run $ ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://[masternode[FQDN]]:7077 --executor-memory 2G --total-executor-cores 10 /opt/spark/lib/spark-examples-1.2.1-hadoop2.4.0.jar 1000 | |
* Run on a Spark Standalone cluster in cluster deploy mode | |
* Running Pi Program | |
- run $ ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://[masternode[FQDN]]:7077 --deploy-mode cluster --executor-memory 2G --total-executor-cores 10 /opt/spark/lib/spark-examples-1.2.1-hadoop2.4.0.jar 1000 | |
Check the web ui history server | |
https://[spark-master - IP]:8080 --- check for Running applications and Completed Application |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment