Skip to content

Instantly share code, notes, and snippets.

@datalove
Last active September 16, 2020 09:49
Show Gist options
  • Star 8 You must be signed in to star a gist
  • Fork 4 You must be signed in to fork a gist
  • Save datalove/5dbb69936d7284601f3e to your computer and use it in GitHub Desktop.
Save datalove/5dbb69936d7284601f3e to your computer and use it in GitHub Desktop.
Getting Spark up and running on RPi

Before starting

  • Download Spark 1.4 to your local machine (laptop, or PC)
  • Go to 192.168.1.1 to get local IPs for newly connected RPis

Configure each Raspberry Pi

Log into the new Raspberry Pi from your machine

  • ssh pi@192.168.1.XXX (default password for pi user is raspberry)

Configure RPi

  • Enter config: sudo raspi-config
  • Choose expand filesystem (this allows the OS to take up the full size of the SD card)
  • Change the hostname of the device to something like rpi007 (under advanced options)
  • When exiting the config, choose to reboot so that changes take effect

Config a spark user

A Spark cluster will need ssh access between nodes using the same username, so let's configure a spark user for this node.

  • add new user: sudo adduser spark (for simplificty, password should be same for all RPis)
  • add spark user to sudo group: sudo adduser spark sudo
  • CTRL+D to log out of SSH (we'll log in as spark user)

Install and test Apache Spark on each Raspberry Pi

Copy spark application to RPi

We downloaded Spark 1.4 to our local machine earlier, and now it's time to transfer the file onto the new RPi using scp to securely transfer the file via SSH to the RPi. Run the following command from your local machine.

  • scp spark-1.4.0-bin-hadoop2.6.tgz spark@192.168.1.139:spark-1.4.0-bin-hadoop2.6.tgz

Test Spark in standalone mode

With the file transferred to the new RPi, let's log into the spark user we created earlier to set up spark.

  • ssh spark@192.168.1.XXX
  • Extract spark: tar xvfz spark-1.3.0-bin-hadoop2.4.tgz

Note that spark produces tonnes of logging messages by default (we will turn this down later).

  • Go to the new folder: cd spark-1.3.0-bin-hadoop2.4

Test spark

  • bin/run-example SparkPi 10 (calculates Pi to 10 decimals)

Test scala shell

  • bin/spark-shell --master local[4]
  • scala> sc.textFile("README.md").count
  • To see what spark is doing, go to [http://raspi08.home:4040/]
  • ctrl+D quits the shell

Test python shell

  • bin/pyspark --master local[4]
  • >>> sc.textFile("README.md").count()
  • ctrl+D quites the shell

That concludes all that's required to set up an individual spark node. In the next section we'll discuss how to get our individual nodes to act as a cluster.

Configure a cluster

Get Hadoop up and running

create users

Download hadoop

  • Download Hadoop 2.6
  • wget http://apache.mirror.digitalpacific.com.au/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
@heathyates
Copy link

Can you please be so kind and gracious enough to provide more verbose instructions in how to configure the cluster?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment