jeremyyeo/README.md

## README.md

      
    Raw
  

              README.md
            
          
    If you download the debug logs from a dbt Cloud run:

We can have a look at the exact command dbt is executing during the source freshness step. In the case above, it executed the following:
select
      max(updated_at) as max_loaded_at,
      convert_timezone('UTC', current_timestamp()) as snapshotted_at
    from development.dbt_jyeo.my_source_a
For a source that I have defined in my project like so:
# models/sources.yml
version: 2
sources:
  - name: dbt_jyeo
    loaded_at_field: updated_at
    freshness:
      warn_after: { count: 12, period: hour }
      error_after: { count: 24, period: hour }
    tables:
      - name: my_source_a
Given the above and these docs from Snowflake... I found that the query above executed scanning 0 bytes:

Note that we can use the explain command to show us the query plan too:

Typically the way Snowflake works with tables is that it stores metadata about each of the columns in a table - this means that operations such as the count, min, max, and so on of a column happen without doing any table scan. You can see here when I try to explain the same source freshness query that would run on a Snowflake sample table (RAW_TPCH.WEATHER.HOURLY_14_TOTAL) that has about 1B rows (1.7 TB in size):

Snowflake didn't have to scan any bytes because the metadata on the t column is already stored.

While the above is for Snowflake, it should apply to BigQuery too I think.
Update from Jerco:
Even though the query is very fast, BigQuery does charge $$ to scan the column 😞 just the column, though. There are legacySQL ways to access the partition metadata for the table instead (free) (editors note: though this appear to be quite gnarly).