Dagster provides a mechanism to create pipeline DAGs which can be executed in a reproducible way. The DAGs can be visualized and run through a UI.
Details here: https://docs.dagster.io/getting-started
In order to run dagster, we first need to make sure we run python in a docker container.
The following code will create the python env in a docker container and will be removed right after we exit.
docker run --rm -it python:3.8-slim-buster /bin/bash
Once inside we can run python --version
to see the python version installed. Once complete, we can now exit
.
What did we above was create a container only for a short time and removed it after we exited from it. In order to use the dagster library on a consistent basis, we will need to persist the container such that we can stop and re-run when needed.
docker run -dt -p 3000:3000 -v `pwd`/dagster_data:/dagster -e DAGSTER_HOME=/dagster -e DAGIT_HOST=0.0.0.0 --name dagster python:3.8-slim-buster
Dagster has a way to run the UI through dagit. Dagit’s default port is 3000. In order to set this through docker, we can open a port via -p 3000:3000
.
At the same time, we can set the volume using the -v `pwd`/dagster_data:/dagster
which will be used to create the dagster project through dagster-cli. Since the volume will be attached, any changes that we make in the container will be reflected in the host file system. If the `pwd`/dagster_data
directory does not exist, docker will create one for us. And the volume will be mapped to the root folder /dagster
inside the container.
We can also give it a name through the --name dagster
argument so the name can be used to identify the container. -d
is used to run the container is detached mode.
In addition, since we know that dagster requires DAGSTER_HOME to be set, and DAGIT_HOST to be localhost we can set that property via a -e
container environment argument. -e DAGSTER_HOME=/dagster -e DAGIT_HOST=0.0.0.0
.
Once the docker run …
command is run, we can now see the docker container running with docker ps
.
akshaykarnawat@riverbed ~ % docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
6eb6fb1eeaa4 python:3.8-slim-buster "python3" 49 seconds ago Up 46 seconds 0.0.0.0:3000->3000/tcp, :::3000->3000/tcp dagster
akshaykarnawat@riverbed ~ %
And once we have a container up and running, we can now go inside it through docker exec
command.
docker exec -it dagster /bin/bash
Once inside the container, we can now run uname -a
and env
.
Inside the container, we now install dagster, and dagit through pip.
pip install dagster dagit
Before we run dagit, lets create a projects dir in ${DAGSTER_HOME}
. In order to create a new project run the following commands:
cd ${DAGSTER_HOME}
mkdir projects
cd projects
dagster new-project calibrate
cd calibrate
pip install --editable .
Commands can be found on this page: https://docs.dagster.io/getting-started/create-new-project. pip install --editable .
sets the current working dir to be a python package in editable mode so local changes can be automatically be applied.
And finally, we can now run dagit
. Since we set the DAGIT_HOST=0.0.0.0
, we can now use the host machine to go to http://localhost:3000 and execute the my_pipeline from the playground tab.
And there we haven it, running pipelines though dagster!
** We can also run the steps through Dockerfile.
** There are multiple docker images which can be used as well -- https://hub.docker.com/search?q=dagster&type=image