RooseveltAdvisors/ipython_notebook+spark(macOS).md

## ipython_notebook+spark(macOS).md

      
    Raw
  

              ipython_notebook+spark(macOS).md
            
          
    How to link Apache Spark 2.1.0 with IPython notebook (Mac OS X)

Tested with
Python 2.7, OS X 10.11.3 El Capitan, Apache Spark 2.1.0 & Hadoop 2.7
Download Apache Spark & Build it

Download Apache Spark and build it or download the pre-built version.
I suggest to download the pre-built version with Hadoop 2.7.
mkdir ~/opt
cd ~/opt
wget http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.7.tgz
tar xvzf spark-2.1.0-bin-hadoop2.7.tgz
rm -f spark-2.1.0-bin-hadoop2.7.tgz

Install Anaconda

Download and install Anaconda.
Install Jupyter

Once you have installed Anaconda open your terminal and type
conda install jupyter
conda update jupyter
Link Spark with IPython Notebook

Open terminal and type
echo "export PYTHONPATH=~/opt/spark-2.1.0-bin-hadoop2.7/python:~/opt/spark-2.1.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip" >> ~/.profile
Now you can source it to make changes available in this terminal
source ~/.profile
or Cmd+Q your terminal and reopen it.
Run Jupyter Notebook

jupyter notebook --ip=0.0.0.0 --NotebookApp.token=''
Now the Jupyter notebook should open in your browser.
To check whether Spark is correctly linked create a new Python 2 file inside Jupyter Notebook.
You should see something like this
In [1]: import pyspark
sc = pyspark.SparkContext('local[*]')
sqlContext = SQLContext(sc)
sc
Out[1]: <pyspark.context.SparkContext at 0x1049bdf90>