Skip to content

Instantly share code, notes, and snippets.

@akshaykarnawat
Last active February 28, 2024 19:18
Show Gist options
  • Save akshaykarnawat/0e62c6769d2819dce16e444fa018b8a2 to your computer and use it in GitHub Desktop.
Save akshaykarnawat/0e62c6769d2819dce16e444fa018b8a2 to your computer and use it in GitHub Desktop.
Running dagster inside a docker container

Dagster Installation in Docker

Dagster provides a mechanism to create pipeline DAGs which can be executed in a reproducible way. The DAGs can be visualized and run through a UI.

Details here: https://docs.dagster.io/getting-started

In order to run dagster, we first need to make sure we run python in a docker container.

The following code will create the python env in a docker container and will be removed right after we exit.

docker run --rm -it python:3.8-slim-buster /bin/bash 

Once inside we can run python --version to see the python version installed. Once complete, we can now exit.

What did we above was create a container only for a short time and removed it after we exited from it. In order to use the dagster library on a consistent basis, we will need to persist the container such that we can stop and re-run when needed.

docker run -dt -p 3000:3000 -v `pwd`/dagster_data:/dagster -e DAGSTER_HOME=/dagster -e DAGIT_HOST=0.0.0.0 --name dagster python:3.8-slim-buster

Dagster has a way to run the UI through dagit. Dagit’s default port is 3000. In order to set this through docker, we can open a port via -p 3000:3000.

At the same time, we can set the volume using the -v `pwd`/dagster_data:/dagster which will be used to create the dagster project through dagster-cli. Since the volume will be attached, any changes that we make in the container will be reflected in the host file system. If the `pwd`/dagster_data directory does not exist, docker will create one for us. And the volume will be mapped to the root folder /dagster inside the container.

We can also give it a name through the --name dagster argument so the name can be used to identify the container. -d is used to run the container is detached mode.

In addition, since we know that dagster requires DAGSTER_HOME to be set, and DAGIT_HOST to be localhost we can set that property via a -e container environment argument. -e DAGSTER_HOME=/dagster -e DAGIT_HOST=0.0.0.0.

Once the docker run … command is run, we can now see the docker container running with docker ps.

akshaykarnawat@riverbed ~ % docker ps
CONTAINER ID   IMAGE                    COMMAND     CREATED          STATUS          PORTS                                       NAMES
6eb6fb1eeaa4   python:3.8-slim-buster   "python3"   49 seconds ago   Up 46 seconds   0.0.0.0:3000->3000/tcp, :::3000->3000/tcp   dagster
akshaykarnawat@riverbed ~ % 

And once we have a container up and running, we can now go inside it through docker exec command.

docker exec -it dagster /bin/bash

Once inside the container, we can now run uname -a and env.

Inside the container, we now install dagster, and dagit through pip.

pip install dagster dagit

Before we run dagit, lets create a projects dir in ${DAGSTER_HOME}. In order to create a new project run the following commands:

cd ${DAGSTER_HOME}
mkdir projects
cd projects  
dagster new-project calibrate
cd calibrate
pip install --editable .

Commands can be found on this page: https://docs.dagster.io/getting-started/create-new-project. pip install --editable . sets the current working dir to be a python package in editable mode so local changes can be automatically be applied.

And finally, we can now run dagit. Since we set the DAGIT_HOST=0.0.0.0, we can now use the host machine to go to http://localhost:3000 and execute the my_pipeline from the playground tab.

And there we haven it, running pipelines though dagster!

** We can also run the steps through Dockerfile.

** There are multiple docker images which can be used as well -- https://hub.docker.com/search?q=dagster&type=image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment