Skip to content

Instantly share code, notes, and snippets.

@evrial
Last active April 12, 2016 16:24
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save evrial/165965f6b86c386cf309 to your computer and use it in GitHub Desktop.
Save evrial/165965f6b86c386cf309 to your computer and use it in GitHub Desktop.
Advanced setup Anaconda + IPython + scikit-learn + PySpark on course machine

For those like me who wish to continue learning about ML using scientific Python stack, check this video workshop by Jake VanderPlas

Here is the code https://github.com/jakevdp/sklearn_pycon2015/

So what steps I did to setup correctly working PySpark with Anaconda with 200 libraries on courses Vagrant VM
  1. Install Anaconda or Miniconda, you should be familiar with linux shell. Vagrant Spark VM is Ubuntu 32bit and Python 2.7 until PySpark for py3 not yet released. Get download url from http://continuum.io/downloads#all Depending on your needs if you wish only selected packages, get Miniconda.
vagrant ssh
curl -L https://3230d63b5fc54e62148e-c95ac804525aac4b6dba79b00b39d1d3.ssl.cf1.rackcdn.com/Anaconda-2.3.0-Linux-x86.sh | bash

Wait until download and install complete. Anaconda installed to /home/vagrant/anaconda/

  1. Now tweak Notebook upstart job config and modify PATH env var to launch the Anaconda distribution from your home directory
sudo nano /etc/init/notebook.conf

change env PATH to

env PATH=/home/vagrant/anaconda/bin/:/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/usr/local/bin/spark-1.3.1-bin-hadoop2.6/bin

Exit and save config, now reload upstart config:

sudo initctl reload notebook 

Optional if you wish to change IPython notebooks directory:

echo "c.NotebookApp.notebook_dir = u'/vagrant'" >> ~/.ipython/profile_pyspark/ipython_notebook_config.py

restart job:

sudo restart notebook

Check your installated libraries in import path: https://github.com/jakevdp/sklearn_pycon2015/blob/master/notebooks/01-Preliminaries.ipynb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment