-
-
Save rafik-rahoui/f98df941c4ccced9c46e9ccbdef63a03 to your computer and use it in GitHub Desktop.
Launching spark using a custom image of spark/bitnami docker compose
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
version: '2' | |
services: | |
spark: | |
#image: docker.io/bitnami/spark:3.3 | |
build: | |
context: . | |
dockerfile: ./Dockerfile | |
environment: | |
- SPARK_MODE=master | |
- SPARK_RPC_AUTHENTICATION_ENABLED=no | |
- SPARK_RPC_ENCRYPTION_ENABLED=no | |
- SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no | |
- SPARK_SSL_ENABLED=no | |
ports: | |
- '8080:8080' | |
- '8888:8888' # this port is for jupyter notebook | |
- '4040:4040' # this port is for spark UI, you may need to open 4041 or 4042 in case 4040 is occupied | |
volumes: | |
- "./:/opt/spark:rw" | |
spark-worker: | |
image: docker.io/bitnami/spark:3.3 | |
environment: | |
- SPARK_MODE=worker | |
- SPARK_MASTER_URL=spark://spark:7077 | |
- SPARK_WORKER_MEMORY=1G | |
- SPARK_WORKER_CORES=1 | |
- SPARK_RPC_AUTHENTICATION_ENABLED=no | |
- SPARK_RPC_ENCRYPTION_ENABLED=no | |
- SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no | |
- SPARK_SSL_ENABLED=no | |
ports: | |
- '8081:8081' # this port is usde to acces the worker UI |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
FROM docker.io/bitnami/spark:3.3 | |
ENV PYTHONPATH="${SPARK_HOME}/python/lib/py4j-0.10.9.5-src.zip:$PYTHONPATH" # you may need to update this in case the docker-compose version changes | |
ENV PATH="${HOME}/.local/bin/:$PATH" | |
USER root | |
RUN apt-get update | |
RUN apt-get install wget -qqq | |
# the rootless user ID | |
USER 1001 | |
COPY requirements.txt . | |
RUN pip install --no-cache-dir -r requirements.txt | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
in this gist, i introduce an alternative method to launch pyspark (data_engineering_zoom_camp / week 5). | |
Basicly, i use a custom image of spark/bitnami docker compose | |
A - you need to specifiy in the dockerfile all the dependencies and make sure | |
to swith between root and rootless users to get appropriate permissions. | |
B- in the requirement.txt file, i added jupyter because it's not pre-packaged with spark/bitnami. you could add other | |
libraries if you need them in your project. | |
C- in the yaml file you need to (1) remove the image and add build as below with the path to the Dockerfile. | |
(2) add the different ports to communicate with the master container (3) build the image with : docker-compose build | |
(4) run : docker-compose up (5) get inside the container by running "docker ps" to get the master_ID then executing | |
docker exec -it master_ID bash (6) launch jupyter notebook running : jupyter notebook --ip=0.0.0.0 (without this ip adress | |
your local machine wont be able to launch jupyter notebook through localhost:8888. | |
Hope i covered every detail. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
can you drop the requirments.txt file?
I added :
jupyter
numpy
pandas
matplotlib
scipy
but it keeps failing when I do docker-compose build