Skip to content

Instantly share code, notes, and snippets.

@rafik-rahoui
Created November 15, 2022 17:13
Show Gist options
  • Save rafik-rahoui/f98df941c4ccced9c46e9ccbdef63a03 to your computer and use it in GitHub Desktop.
Save rafik-rahoui/f98df941c4ccced9c46e9ccbdef63a03 to your computer and use it in GitHub Desktop.
Launching spark using a custom image of spark/bitnami docker compose
version: '2'
services:
spark:
#image: docker.io/bitnami/spark:3.3
build:
context: .
dockerfile: ./Dockerfile
environment:
- SPARK_MODE=master
- SPARK_RPC_AUTHENTICATION_ENABLED=no
- SPARK_RPC_ENCRYPTION_ENABLED=no
- SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
- SPARK_SSL_ENABLED=no
ports:
- '8080:8080'
- '8888:8888' # this port is for jupyter notebook
- '4040:4040' # this port is for spark UI, you may need to open 4041 or 4042 in case 4040 is occupied
volumes:
- "./:/opt/spark:rw"
spark-worker:
image: docker.io/bitnami/spark:3.3
environment:
- SPARK_MODE=worker
- SPARK_MASTER_URL=spark://spark:7077
- SPARK_WORKER_MEMORY=1G
- SPARK_WORKER_CORES=1
- SPARK_RPC_AUTHENTICATION_ENABLED=no
- SPARK_RPC_ENCRYPTION_ENABLED=no
- SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
- SPARK_SSL_ENABLED=no
ports:
- '8081:8081' # this port is usde to acces the worker UI
FROM docker.io/bitnami/spark:3.3
ENV PYTHONPATH="${SPARK_HOME}/python/lib/py4j-0.10.9.5-src.zip:$PYTHONPATH" # you may need to update this in case the docker-compose version changes
ENV PATH="${HOME}/.local/bin/:$PATH"
USER root
RUN apt-get update
RUN apt-get install wget -qqq
# the rootless user ID
USER 1001
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
in this gist, i introduce an alternative method to launch pyspark (data_engineering_zoom_camp / week 5).
Basicly, i use a custom image of spark/bitnami docker compose
A - you need to specifiy in the dockerfile all the dependencies and make sure
to swith between root and rootless users to get appropriate permissions.
B- in the requirement.txt file, i added jupyter because it's not pre-packaged with spark/bitnami. you could add other
libraries if you need them in your project.
C- in the yaml file you need to (1) remove the image and add build as below with the path to the Dockerfile.
(2) add the different ports to communicate with the master container (3) build the image with : docker-compose build
(4) run : docker-compose up (5) get inside the container by running "docker ps" to get the master_ID then executing
docker exec -it master_ID bash (6) launch jupyter notebook running : jupyter notebook --ip=0.0.0.0 (without this ip adress
your local machine wont be able to launch jupyter notebook through localhost:8888.
Hope i covered every detail.
@Muhammadatef
Copy link

can you drop the requirments.txt file?
I added :
jupyter
numpy
pandas
matplotlib
scipy

but it keeps failing when I do docker-compose build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment