Skip to content

Instantly share code, notes, and snippets.

@slopp
slopp / users_by_role.py
Created July 3, 2023 17:25
Get Users by Role
from gql import Client, gql
from gql.transport.requests import RequestsHTTPTransport
import os
import pandas as pd
from datetime import datetime, timedelta
USER_GRANTS_QUERY = """
query UsersByRole {
usersOrError {
@slopp
slopp / HotTakes.md
Last active June 20, 2023 21:49
Dagster Hot Takes

Dagster Hot Takes

Less On-Call Pages: Retries and Alerts

https://youtu.be/A6WtkMwe4VQ

Getting an on-call page is the worst. Unfortunately most task-based orchestrators page teams frequently, whenever jobs fail. With Dagster you can reduce this alert fatigue by using retry strategies and only getting notified when SLAs are violated.

Resources:

@slopp
slopp / ReadMe.md
Created May 5, 2023 16:39
Dagster Cloud External Compute Logs

In release 1.3.3 Dagster introduced the ability for Dagster to display a link to compute logs instead of displaying the logs directly. This capability is important for Dagster Cloud users who do not want to send compute logs to Dagster Cloud, but still want end users to be able to access logs while debugging a run.

This capability is possible because of additions to the compute log manager. Users can implement their own compute log manager for full control over the link behavior or use the default dagster-aws implementation. The default implementation stores logs in s3 and displays a link to the log file:

Screen Shot 2023-05-04 at 10 53 50 AM

*Note in 1.3.3 the displayed link is directly to the s3 object. In 1.3.4 the displayed link is to the s3 console for the log object which provides a better experience for non-public s3

@slopp
slopp / dagster_cloud_usage_by_repo.py
Created April 28, 2023 20:43
Get dagster cloud usage by job and repo
from gql import Client, gql
from gql.transport.requests import RequestsHTTPTransport
import os
import pandas as pd
from datetime import datetime, timedelta
def get_month_starts(start_date: datetime, end_date:datetime):
""" Attributed to chatgpt """
month_starts = []
current_date = start_date.replace(day=1)
@slopp
slopp / Dockerfile
Created April 25, 2023 18:49
Example Dagster Cloud Dockerfile
FROM python:3.10-slim
# Get the files that define dependencies
COPY *setup.py *requirements.txt /
# If setup.py exists, we install the dependencies before
# copying all other files
RUN if [ -f "setup.py" ]; then \
pip install .; \
fi
@slopp
slopp / asset_factory.py
Created April 19, 2023 18:29
Asset Factory Example
from dagster import asset, Definitions, AssetIn
assets_to_make = ["a", "b", "c"]
def make_assets(asset_to_make):
@asset(
name=asset_to_make
)
def asset_template():
print(asset_to_make)
@slopp
slopp / README.md
Last active April 18, 2023 00:46
Push dagster run metadata to postgres

Push Metadata to External Postgres

Dagster maintains a metadata DB that allows users to filter, search, and take action on the status of past runs. However, in some cases it is useful to view this metadata outside of Dagster. This example shows how a scheduled Dagster job can be used to push the run metadata into another Postgres DB.

Get Started

To run this example:

  1. Create a Postgres DB to be the metadata target
@slopp
slopp / definitions.py
Last active April 14, 2023 20:51
Example of Fivetran and Downstream Assets
import json
from dagster import asset, AssetIn, AssetKey, Definitions, with_resources
from dagster_fivetran import build_fivetran_assets, fivetran_resource
from databricks_cli.sdk import JobService
from dagster_databricks import databricks_client
fivetran_instance = fivetran_resource.configured(
{
@slopp
slopp / step10_parameters.py
Last active February 24, 2023 23:39
ETL to Assets Code Snippets
# Sigh - either asset config or dynamic partitions or asset factories depending on the use case
@slopp
slopp / README.md
Created February 16, 2023 17:01
Dagster: Controlling Parallelism within a Run

Dagster: Controlling Parallelism within a Run

Dagster has many ways to control parallelism. In Dagster Cloud deployments, you can control how many concurrent runs can happen at one time through deployment settings.

Within a run, you can also control how many parallel operations happen at once. By default, runs use the multi-process executor, and the number of parallel operations within a run is based on the number of parallel threads available. For example, if you are using Dagster Cloud Hybrid with Kubernetes, the number of parallel operations within a run will be based on the resources available in a pod.

This behavior can be changed by modifiying the executor settings.

Default