Launch a Jupyter Notebook server using Docker and the jupyter/pyspark-notebook
image on your local machine.
Copy and paste the below into your terminal.
docker run -d -v `pwd`:/home/jovyan -p 80:8888 jupyter/pyspark-notebook
This will launch a Jupyter Notebook server available at http://localhost/. By default this server has authentication and requires a token.
-
Retrieve the container id of the Jupyter Notebook Server
docker ps
This command displays currently running Docker containers. Look for the container using the image
jupyter/pyspark-notebook
.Copy the
CONTAINER ID
. -
Retrieve the token. Run the following command, replace
CONTAINERID
with the value copied in the previous step.docker exec CONTAINERID jupyter notebook list
You should see an output like the following:
Currently running servers: http://0.0.0.0:8888/?token=b362ef9ea151f45b29cdcf9e9c39e9c914ef2d93478bce17 :: /home/jovyan
Copy the value after
token=
. This is the authentication token. -
Access the server at http://localhost/ and use the authentication token to sign in.
from os import environ
environ['PYSPARK_SUBMIT_ARGS'] = '--packages "io.delta:delta-core_2.11:0.5.0" pyspark-shell'
from pyspark import sql
spark = sql.SparkSession.builder \
.master("local[8]") \
.getOrCreate()
def display(dataframe):
return dataframe.show()
If you see an error like this:
docker: Error response from daemon: driver failed programming external connectivity on endpoint nifty_solomon (8645fa398b2b8e8a9ec19c8c41aebcfc734b5fa4979721f7cda08d51e8fc17cd): Bind for 0.0.0.0:8888 failed: port is already allocated.
that means you are already running jupyter on that port.
We have to figure out how this could be made easy for Windows users too. Most banks use Windows as the compute environment.