First, make sure you've installed docker on your system
-
Create some folder you wish to put all files related to your notebook (data files, log files, etc), and navigate to this directory
-
Run this in your terminal:
$ docker run --name pyspark --rm -p 8888:8888 -v .:/home/jovyan/work jupyter/pyspark-notebook
Docker images will download, and you'll eventually see the output of Jupyter starting. At the end you'll see something like
http://(120e4fd32df5 or 127.0.0.1):8888/?token=17729f5aa6dd4f54dd8d16d029a39b85396427543161cc78
Which means you should paste
http://localhost:8888/
followed by the?token=<your token>
which is unique to your session, in the address bar of your browser. -
Now open a new terminal and type this:
$ docker exec -ti pyspark bash
This opens a bash session inside that running container.
-
Now, in this session you can install some python dependencies, like BigDL, and some others:
$ pip install BigDL==0.6.0 pylib
Most other libraries we need are already installed.
Now, in your notebook, you can create new Python 3 notebook, using pyspark, bigdl, and so on. Every file you put in your initial folder will be visible from the notebook interface and will be persisted across docker sessions.