matehat/install_pyspark_and_bigdl.md

## install_pyspark_and_bigdl.md

      
    Raw
  

              install_pyspark_and_bigdl.md
            
          
    First, make sure you've installed docker on your system


Create some folder you wish to put all files related to your notebook (data files, log files, etc), and navigate to this directory


Run this in your terminal:
$ docker run --name pyspark --rm -p 8888:8888 -v .:/home/jovyan/work jupyter/pyspark-notebook
Docker images will download, and you'll eventually see the output of Jupyter starting. At the end you'll see something like
http://(120e4fd32df5 or 127.0.0.1):8888/?token=17729f5aa6dd4f54dd8d16d029a39b85396427543161cc78

Which means you should paste http://localhost:8888/ followed by the ?token=<your token> which is unique to your session, in
the address bar of your browser.


Now open a new terminal and type this:
$ docker exec -ti pyspark bash
This opens a bash session inside that running container.


Now, in this session you can install some python dependencies, like BigDL, and some others:
$ pip install BigDL==0.6.0 pylib
Most other libraries we need are already installed.


Now, in your notebook, you can create new Python 3 notebook, using pyspark, bigdl, and so on. Every file you put in your initial folder will be visible from the notebook interface and will be persisted across docker sessions.