Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Installing Toree+Spark 2.1 on Ubuntu 16.04

Installing Toree+Spark 2.1 on Ubuntu 16.04

## 23rd January 2017

At the time of writing, the pip install of toree is not compatible with spark 2.x. We need to use the master branch from git.

Requirements

sudo apt install openjdk-8-jdk-headless
sudo apt install git

Install sbt

sbt isn't available in the Ubuntu repos. Install it manually or do the following:-

echo "deb https://dl.bintray.com/sbt/debian /" | sudo tee -a /etc/apt/sources.list.d/sbt.list
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 642AC823
sudo apt-get update
sudo apt-get install sbt

Install Anaconda Python

Anaconda Python can be installed to a user's home directory and it contains most of the Python modules needed by the majority of researchers. It can coexist with the normal Ubuntu Python packages

wget https://repo.continuum.io/archive/Anaconda3-4.2.0-Linux-x86_64.sh
chmod +x ./Anaconda3-4.2.0-Linux-x86_64.sh
./Anaconda3-4.2.0-Linux-x86_64.sh

Follow the instructions. When you are asked the following, say yes

Do you wish the installer to prepend the Anaconda3 install location
to PATH in your /home/walkingrandomly/.bashrc ? [yes|no]

Start a new terminal session so that the .bashrc changes get applied.

Install Spark

wget http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0.tgz
tar -xvzf ./spark-2.1.0.tgz
cd spark-2.1.0/
build/mvn -DskipTests clean package

Check that you can run the spark shell

./bin/spark-shell

Press CTRL-D to exit the shell

Install toree from source

git clone https://github.com/apache/incubator-toree
cd incubator-toree/
make dist

Then do

make release

You'll get this error which you can ignore:

/bin/sh: 1: docker: not found
Makefile:212: recipe for target 'dist/toree-pip/toree-0.2.0.dev1.tar.gz' failed
make: *** [dist/toree-pip/toree-0.2.0.dev1.tar.gz] Error 127

Now we can install the built package ::

cd dist/toree-pip/
python setup.py install

Install the jupyter kernel. I call this one bespoke_spark to differentiate from any others you may have. Be sure to change the value of --spark-home to yours.

jupyter toree install --kernel_name=bespoke_spark --spark_home=/home/walkingrandomly/spark-2.1.0/   --user

Now launch Jupyter with

jupyter notebook

and you'll be able to select the kernel and use spark 2.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.