datalove/spark_pi_cluster.md

## spark_pi_cluster.md

      
    Raw
  

              spark_pi_cluster.md
            
          
    Before starting


Download Spark 1.4 to your local machine (laptop, or PC)
Go to 192.168.1.1 to get local IPs for newly connected RPis

Configure each Raspberry Pi

Log into the new Raspberry Pi from your machine


ssh pi@192.168.1.XXX (default password for pi user is raspberry)

Configure RPi


Enter config:  sudo raspi-config
Choose expand filesystem (this allows the OS to take up the full size of the SD card)
Change the hostname of the device to something like rpi007 (under advanced options)
When exiting the config, choose to reboot so that changes take effect

Config a spark user

A Spark cluster will need ssh access between nodes using the same username, so let's configure a spark user for this node.

add new user: sudo adduser spark (for simplificty, password should be same for all RPis)
add spark user to sudo group: sudo adduser spark sudo
CTRL+D to log out of SSH (we'll log in as spark user)

Install and test Apache Spark on each Raspberry Pi

Copy spark application to RPi

We downloaded Spark 1.4 to our local machine earlier, and now it's time to transfer the file onto the new RPi using scp to securely transfer the file via SSH to the RPi. Run the following command from your local machine.

scp spark-1.4.0-bin-hadoop2.6.tgz spark@192.168.1.139:spark-1.4.0-bin-hadoop2.6.tgz

Test Spark in standalone mode

With the file transferred to the new RPi, let's log into the spark user we created earlier to set up spark.

ssh spark@192.168.1.XXX
Extract spark: tar xvfz spark-1.3.0-bin-hadoop2.4.tgz

Note that spark produces tonnes of logging messages by default (we will turn this down later).

Go to the new folder: cd spark-1.3.0-bin-hadoop2.4

Test spark


bin/run-example SparkPi 10 (calculates Pi to 10 decimals)

Test scala shell


bin/spark-shell --master local[4]
scala> sc.textFile("README.md").count
To see what spark is doing, go to [http://raspi08.home:4040/]
ctrl+D quits the shell

Test python shell


bin/pyspark --master local[4]
>>> sc.textFile("README.md").count()
ctrl+D quites the shell

That concludes all that's required to set up an individual spark node. In the next section we'll discuss how to get our individual nodes to act as a cluster.
Configure a cluster

Get Hadoop up and running

create users

Download hadoop


Download Hadoop 2.6
wget http://apache.mirror.digitalpacific.com.au/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz