## 23rd January 2017
At the time of writing, the pip install of toree is not compatible with spark 2.x. We need to use the master branch from git.
sudo apt install openjdk-8-jdk-headless
sudo apt install git
sbt isn't available in the Ubuntu repos. Install it manually or do the following:-
echo "deb https://dl.bintray.com/sbt/debian /" | sudo tee -a /etc/apt/sources.list.d/sbt.list
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 642AC823
sudo apt-get update
sudo apt-get install sbt
Anaconda Python can be installed to a user's home directory and it contains most of the Python modules needed by the majority of researchers. It can coexist with the normal Ubuntu Python packages
wget https://repo.continuum.io/archive/Anaconda3-4.2.0-Linux-x86_64.sh
chmod +x ./Anaconda3-4.2.0-Linux-x86_64.sh
./Anaconda3-4.2.0-Linux-x86_64.sh
Follow the instructions. When you are asked the following, say yes
Do you wish the installer to prepend the Anaconda3 install location
to PATH in your /home/walkingrandomly/.bashrc ? [yes|no]
Start a new terminal session so that the .bashrc
changes get applied.
wget http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0.tgz
tar -xvzf ./spark-2.1.0.tgz
cd spark-2.1.0/
build/mvn -DskipTests clean package
Check that you can run the spark shell
./bin/spark-shell
Press CTRL-D to exit the shell
git clone https://github.com/apache/incubator-toree
cd incubator-toree/
make dist
Then do
make release
You'll get this error which you can ignore:
/bin/sh: 1: docker: not found
Makefile:212: recipe for target 'dist/toree-pip/toree-0.2.0.dev1.tar.gz' failed
make: *** [dist/toree-pip/toree-0.2.0.dev1.tar.gz] Error 127
Now we can install the built package ::
cd dist/toree-pip/
python setup.py install
Install the jupyter kernel. I call this one bespoke_spark
to differentiate from any others you may have.
Be sure to change the value of --spark-home
to yours.
jupyter toree install --kernel_name=bespoke_spark --spark_home=/home/walkingrandomly/spark-2.1.0/ --user
Now launch Jupyter with
jupyter notebook
and you'll be able to select the kernel and use spark 2.1