Skip to content

Instantly share code, notes, and snippets.

@slopp
slopp / dagster_cloud_usage_by_repo.py
Created April 28, 2023 20:43
Get dagster cloud usage by job and repo
from gql import Client, gql
from gql.transport.requests import RequestsHTTPTransport
import os
import pandas as pd
from datetime import datetime, timedelta
def get_month_starts(start_date: datetime, end_date:datetime):
""" Attributed to chatgpt """
month_starts = []
current_date = start_date.replace(day=1)
@slopp
slopp / Dockerfile
Created April 25, 2023 18:49
Example Dagster Cloud Dockerfile
FROM python:3.10-slim
# Get the files that define dependencies
COPY *setup.py *requirements.txt /
# If setup.py exists, we install the dependencies before
# copying all other files
RUN if [ -f "setup.py" ]; then \
pip install .; \
fi
@slopp
slopp / asset_factory.py
Created April 19, 2023 18:29
Asset Factory Example
from dagster import asset, Definitions, AssetIn
assets_to_make = ["a", "b", "c"]
def make_assets(asset_to_make):
@asset(
name=asset_to_make
)
def asset_template():
print(asset_to_make)
@slopp
slopp / README.md
Last active April 18, 2023 00:46
Push dagster run metadata to postgres

Push Metadata to External Postgres

Dagster maintains a metadata DB that allows users to filter, search, and take action on the status of past runs. However, in some cases it is useful to view this metadata outside of Dagster. This example shows how a scheduled Dagster job can be used to push the run metadata into another Postgres DB.

Get Started

To run this example:

  1. Create a Postgres DB to be the metadata target
@slopp
slopp / definitions.py
Last active April 14, 2023 20:51
Example of Fivetran and Downstream Assets
import json
from dagster import asset, AssetIn, AssetKey, Definitions, with_resources
from dagster_fivetran import build_fivetran_assets, fivetran_resource
from databricks_cli.sdk import JobService
from dagster_databricks import databricks_client
fivetran_instance = fivetran_resource.configured(
{
@slopp
slopp / step10_parameters.py
Last active February 24, 2023 23:39
ETL to Assets Code Snippets
# Sigh - either asset config or dynamic partitions or asset factories depending on the use case
@slopp
slopp / README.md
Created February 16, 2023 17:01
Dagster: Controlling Parallelism within a Run

Dagster: Controlling Parallelism within a Run

Dagster has many ways to control parallelism. In Dagster Cloud deployments, you can control how many concurrent runs can happen at one time through deployment settings.

Within a run, you can also control how many parallel operations happen at once. By default, runs use the multi-process executor, and the number of parallel operations within a run is based on the number of parallel threads available. For example, if you are using Dagster Cloud Hybrid with Kubernetes, the number of parallel operations within a run will be based on the resources available in a pod.

This behavior can be changed by modifiying the executor settings.

Default

@slopp
slopp / definitions.py
Created January 13, 2023 18:20
Launchpad + asset materializating job + op config
from dagster import Definitions, job, resource, op, AssetMaterialization, asset
class MyResource():
def hello(self, name: str):
print(f"hello {name}")
@resource
def my_resource(_):
@slopp
slopp / README.md
Last active December 29, 2022 00:15
[Draft] Dagster Robustness Guide

What is a robust data platform?

This guide centralizes concepts needed to run a "robust" production data platform using Dagster Cloud, where robust means assets and infrastructure are:

  • Fault Tolerant via replication, resource constraints, retries, parallelization, and run queues and priorities
  • Observable via customizable logs and useful alerts

This guide does not cover every aspect of a production data platform. Other useful resources include:

  • Testing and CICD to ensure new code does what is expected without breaking existing assets
  • Project Structure to build a code base that can scale across teams and dependencies [Todo: Link to guide]
  • Data Expectations to ensure the data flowing through your pipelines is valid and meets your expectations [Todo: refresh guide for assets, add section on conditional behavior]
@slopp
slopp / README.md
Last active December 19, 2022 16:55
Diff Eqs in Dagster

Self-Dependent Asset Partitions

As of 1.1.7, Dagster supports assets that rely on prior versions of themselves, for example, an asset that implements a differential equation.

Getting Started

To run this example, first install the dependencies:

pip install dagster, dagit