samklr/spark-standalone.sh

## spark-standalone.sh
--------Spark Standalone---------

Prerequistes :
* Set JAVA_HOME env variable
* Configure ssh so master and workers can talk without pw
    - $ ssh-keygen -- enter
    - Copy the SSH Public Key (id_rsa.pub) to the root account on your target hosts.
        .ssh/id_rsa .ssh/id_rsa.pub
    - Add the SSH Public Key to the authorized_keys file on your target hosts.
        $ cat id_rsa.pub >> authorized_keys

* disable iptables
    - run $ /etc/init.d/iptables stop
    - run $ chkconfig iptables off
* disable selinux
    - run $ setenforce 0
    - run $ vim /etc/sysconfig/selinux
     change SELINUX = DISABLED
    - sestatus - check for selinux status
* add hostname to /etc/hosts
    - run $ vim /etc/hosts
    add host ip hostname

Installation :
* Download the spark in /opt
    - $ cd /opt
    - $ wget http://d3kbcqa49mib13.cloudfront.net/spark-1.3.0-bin-hadoop2.4.tgz
      or http://apache.mirrors.ovh.net/ftp.apache.org/dist/spark/spark-1.3.0/spark-1.3.0-bin-hadoop2.4.tgz
* Untar the folder
    - $ tar -xzvf spark-1.2.1-bin-hadoop2.4.tgz
* Create a symbolic link
    - $ ln -s spark-1.2.1-bin-hadoop2.4.tgz spark
* Edit the conf/slaves file on your master and fill in the workers’ hostnames.
* run sbin/start-all.sh on your master (it is important to run it there rather than on a worker). If everything started, you should get   no prompts for a password, and the cluster manager’s web UI should appear at http://masternode:8080 and should show all your workers.
* run sbin/stop-all.sh to stop the cluster
OR
* deploy master
  bin/spark-class org.apache.spark.deploy.master.Master
* deploy workers
  bin/spark-class org.apache.spark.deploy.worker.Worker spark://[masternode[FQDN]]:7077

If all is good to go
Run sample jobs
  - run $ ./bin/spark-shell --master spark://[masternode[FQDN]]:7077
      You can also pass an option --total-executor-cores <numCores> to control the number of cores that spark-shell uses on the cluster.

Submitting Application
* Run on a Spark Standalone cluster in client deploy mode
* Running Pi program
  - run $ ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://[masternode[FQDN]]:7077 --executor-memory 2G                  --total-executor-cores 10 /opt/spark/lib/spark-examples-1.2.1-hadoop2.4.0.jar 1000
* Run on a Spark Standalone cluster in cluster deploy mode
* Running Pi Program
  - run $ ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://[masternode[FQDN]]:7077 --deploy-mode cluster                 --executor-memory 2G --total-executor-cores 10 /opt/spark/lib/spark-examples-1.2.1-hadoop2.4.0.jar 1000

Check the web ui history server
https://[spark-master - IP]:8080 --- check for Running applications and Completed Application
	--------Spark Standalone---------

	Prerequistes :
	* Set JAVA_HOME env variable
	* Configure ssh so master and workers can talk without pw
	- $ ssh-keygen -- enter
	- Copy the SSH Public Key (id_rsa.pub) to the root account on your target hosts.
	.ssh/id_rsa .ssh/id_rsa.pub
	- Add the SSH Public Key to the authorized_keys file on your target hosts.
	$ cat id_rsa.pub >> authorized_keys

	* disable iptables
	- run $ /etc/init.d/iptables stop
	- run $ chkconfig iptables off
	* disable selinux
	- run $ setenforce 0
	- run $ vim /etc/sysconfig/selinux
	change SELINUX = DISABLED
	- sestatus - check for selinux status
	* add hostname to /etc/hosts
	- run $ vim /etc/hosts
	add host ip hostname

	Installation :
	* Download the spark in /opt
	- $ cd /opt
	- $ wget http://d3kbcqa49mib13.cloudfront.net/spark-1.3.0-bin-hadoop2.4.tgz
	or http://apache.mirrors.ovh.net/ftp.apache.org/dist/spark/spark-1.3.0/spark-1.3.0-bin-hadoop2.4.tgz
	* Untar the folder
	- $ tar -xzvf spark-1.2.1-bin-hadoop2.4.tgz
	* Create a symbolic link
	- $ ln -s spark-1.2.1-bin-hadoop2.4.tgz spark
	* Edit the conf/slaves file on your master and fill in the workers’ hostnames.
	* run sbin/start-all.sh on your master (it is important to run it there rather than on a worker). If everything started, you should get no prompts for a password, and the cluster manager’s web UI should appear at http://masternode:8080 and should show all your workers.
	* run sbin/stop-all.sh to stop the cluster
	OR
	* deploy master
	bin/spark-class org.apache.spark.deploy.master.Master
	* deploy workers
	bin/spark-class org.apache.spark.deploy.worker.Worker spark://[masternode[FQDN]]:7077

	If all is good to go
	Run sample jobs
	- run $ ./bin/spark-shell --master spark://[masternode[FQDN]]:7077
	You can also pass an option --total-executor-cores <numCores> to control the number of cores that spark-shell uses on the cluster.

	Submitting Application
	* Run on a Spark Standalone cluster in client deploy mode
	* Running Pi program
	- run $ ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://[masternode[FQDN]]:7077 --executor-memory 2G --total-executor-cores 10 /opt/spark/lib/spark-examples-1.2.1-hadoop2.4.0.jar 1000
	* Run on a Spark Standalone cluster in cluster deploy mode
	* Running Pi Program
	- run $ ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://[masternode[FQDN]]:7077 --deploy-mode cluster --executor-memory 2G --total-executor-cores 10 /opt/spark/lib/spark-examples-1.2.1-hadoop2.4.0.jar 1000

	Check the web ui history server
	https://[spark-master - IP]:8080 --- check for Running applications and Completed Application