Skip to content

Instantly share code, notes, and snippets.

@jeremyyeo
Last active May 13, 2022 02:38
Show Gist options
  • Save jeremyyeo/9b33ae5dee456a59f800e4fcf87c74fe to your computer and use it in GitHub Desktop.
Save jeremyyeo/9b33ae5dee456a59f800e4fcf87c74fe to your computer and use it in GitHub Desktop.
Are dbt freshness checks expensive in Snowflake? #dbt

If you download the debug logs from a dbt Cloud run:

image

We can have a look at the exact command dbt is executing during the source freshness step. In the case above, it executed the following:

select
      max(updated_at) as max_loaded_at,
      convert_timezone('UTC', current_timestamp()) as snapshotted_at
    from development.dbt_jyeo.my_source_a

For a source that I have defined in my project like so:

# models/sources.yml
version: 2
sources:
  - name: dbt_jyeo
    loaded_at_field: updated_at
    freshness:
      warn_after: { count: 12, period: hour }
      error_after: { count: 24, period: hour }
    tables:
      - name: my_source_a

Given the above and these docs from Snowflake... I found that the query above executed scanning 0 bytes:

image

Note that we can use the explain command to show us the query plan too:

image

Typically the way Snowflake works with tables is that it stores metadata about each of the columns in a table - this means that operations such as the count, min, max, and so on of a column happen without doing any table scan. You can see here when I try to explain the same source freshness query that would run on a Snowflake sample table (RAW_TPCH.WEATHER.HOURLY_14_TOTAL) that has about 1B rows (1.7 TB in size):

image

Snowflake didn't have to scan any bytes because the metadata on the t column is already stored.


While the above is for Snowflake, it should apply to BigQuery too I think.

Update from Jerco:

Even though the query is very fast, BigQuery does charge $$ to scan the column 😞 just the column, though. There are legacySQL ways to access the partition metadata for the table instead (free) (editors note: though this appear to be quite gnarly).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment