Skip to content

Instantly share code, notes, and snippets.

@mistercrunch
Created April 29, 2022 21:40
Show Gist options
  • Save mistercrunch/3de2ac9a19843295fc77029a218d2d42 to your computer and use it in GitHub Desktop.
Save mistercrunch/3de2ac9a19843295fc77029a218d2d42 to your computer and use it in GitHub Desktop.
{#
Given a source table and a DATETIME or DATE column, this macro returns
a DATE `YYYY-MM-DD` pointing the the most recent available day.
For streaming tables that come in impredictable batches, the previous_day
flag when on will very likely point to the latest FULL day as we can
safely assume that if we have received event for a given day, the
previous one is complete.
#}
{%- macro latest_dt(source_table, dttm_column="ds", previous_day=False) -%}
{%- set max_partition = none -%}
{%- if execute -%}
{%- call statement('max_partition_date_query', fetch_result=True) -%}
SELECT CAST(MAX({{ dttm_column }}) AS DATE) AS max_partition_date
FROM {{ source_table }}
WHERE {{ dttm_column }} >= CAST('1970-01-01' AS DATE)
{%- endcall -%}
{%- set result = load_result('max_partition_date_query') -%}
{% if result %}
{%- set max_partition = result['data'][0][0] -%}
{%- endif -%}
{%- endif -%}
{%- if max_partition -%}
{%- if previous_day-%}
{%- set max_partition = max_partition - modules.datetime.timedelta(1) -%}
{{- max_partition.strftime('%Y-%m-%d') -}}
{%- else -%}
{{- max_partition.strftime('%Y-%m-%d') -}}
{%- endif -%}
{%- endif -%}
{%- endmacro -%}
~
{#
Given a source table that is partitionned, this macro returns
a query pointing to the latest partition
Note that this could be improved to better support different column
types with proper casting. We assume a DATETIME column here.
Also note that we may need to wait on a more directed signal from the
previous ETL phase that insures that ALL workspaces have been synced
#}
{% macro latest_partition(source_table, dttm_column="ds", previous_day=False) %}
{%- set max_partition = latest_dt(source_table, dttm_column, previous_day) -%}
SELECT * FROM {{ source_table }}
{% if max_partition -%}
WHERE {{ dttm_column }} = '{{ max_partition }}'
{% else -%}
WHERE {{ dttm_column }} > '1970-01-01'
{% endif %}
{% endmacro %}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment