Skip to content

Instantly share code, notes, and snippets.

@jtalmi
jtalmi / macros_using_changed_models.py
Last active October 5, 2022 16:55
A script to generate a list of dbt models using changed macros relative to a git branch
#!/usr/bin/env python3
'''
Script to detect models downstream of changed macros, relative to a git branch.
Usage:
$ python3 models_using_changed_macros.py --branch master --children --manifest_path /path/to/manifest.json
'''
import os
@jtalmi
jtalmi / Snowflake insert by period dbt macro
Last active October 2, 2020 16:57
An insert_by_period macro for snowflake, with support for short sample windows when target = dev
{% macro get_period_boundaries(target_schema, target_table, timestamp_field, start_date, stop_date, period) -%}
{% call statement('period_boundaries', fetch_result=True) -%}
with data as (
select
coalesce(max({{timestamp_field}}), {{start_date}})::timestamp as start_timestamp,
coalesce(
{{dbt_utils.dateadd('millisecond',
-1,
"nullif('" ~ stop_date ~ "','')::timestamp")
@jtalmi
jtalmi / dbt_linter.py
Last active March 15, 2022 20:41
dbt linter -- check for unique/not_null tests and description/columns
#!/usr/bin/env python3
"""
CI script to check:
1. Models have both a unique and not_null test.
2. Models have a description and columns (i.e. a schema.yml entry)
"""
import json
import logging
import os
import subprocess
#!/usr/bin/env python3
'''Script to autogenerate dbt commands for changed models against a chosen git branch,
with support for fully refreshing models with specific tags.
Usage:
$ python3 dbt_run_changed.py --target_branch master --target dev --commands [run, test] --full_refresh_tags [full_refresh]
Assume model1 and model2 are changed models and model2 is tagged with "full_refresh". The script will generate three dbt commands:
1. dbt run --target dev --model model2 --full-refresh
@jtalmi
jtalmi / dataframes_codelab.md
Last active November 16, 2017 18:38
Pyspark DataFrames codelab

PySpark DataFrames

A DataFrame is a distributed collection of data, which is organized into named columns. Conceptually, it is equivalent to relational tables with good optimization techniques.

id name age
1201 phil 25
1202 barbara 28
1203 jon 39
1204 dirk 23