Skip to content

Instantly share code, notes, and snippets.

@jeremyyeo
Last active July 21, 2022 21:18
Show Gist options
  • Save jeremyyeo/064106e480106b49cd337f33a765ef20 to your computer and use it in GitHub Desktop.
Save jeremyyeo/064106e480106b49cd337f33a765ef20 to your computer and use it in GitHub Desktop.
Recording model run errors in a table #dbt

Recording model run errors in a table

The typical way of logging individual model run meta data (status, updated timestamp, etc) are via:

  1. Using post-hooks that basically execute insert into some_logging_table values ('my_model', 'my_model_status') queries after each model is built.
  2. The logging package + post-hooks.
  3. (Snowflake only) The dbt_artifacts package + run-operations. This essentially uploads dbt artifacts that contain model run meta data into a Snowflake stage and materializes it into various views for querying.

Update: The dbt_artifacts package can now work with the results context variable allowing us to upload run results during the on-run-end hook: https://github.com/brooklyn-data/dbt_artifacts/tree/1.0.0b1

The problem with the above approaches are that post-hooks don't run if a model fails / errors and if you're using dbt Cloud, run-operations are skipped if any of the previous dbt Cloud job steps (such as dbt run) has errored:

example

The solution is to make use of on-run-end hooks that will still execute even if the builing of models themselves errors.


The following steps show how this can be accomplished (in both dbt Core / CLI and dbt Cloud).

  1. Set up a barebones project to test this out:
-- models/my_model_1.sql
select 1 as val

-- models/my_model_2.sql 
-- this model should result in an error as column 'x' doesn't exist.
select x from {{ 'my_model_1' }}

-- models/my_model_3.sql
select * from {{ 'my_model_1' }}
# dbt_project.yml
name: "snowflake"
version: "1.0.0"
config-version: 2

profile: "snowflake"

model-paths: ["models"]
macro-paths: ["macros"]

models:
  snowflake:
    +materialized: table

on-run-start:
  - "{{ create_run_end_results_table() }}"

on-run-end:
  - "{{ log_run_end_results(results) }}"
  1. Add the following macros to the project:
-- macros/log_run_end_results.sql
{% macro create_run_end_results_table() %}

  {{ log('Creating `dbt_meta__run_end_results` table if not exists.', info=True) }}

  create table if not exists {{ target.database }}.{{ target.schema }}.dbt_meta__run_end_results (
      run_invocation_id text not null,
      model_identifier text not null,
      model_status text not null,
      model_message text not null,
      updated_at timestamp not null
  );

{% endmacro %}

{% macro log_run_end_results(results) %}

    {% if execute %}
    
        {{ log('Recording model run results in `dbt_meta__run_end_results`.', info=True) }}

        {% for res in results %}
        
            {# 
            /*
              Because results messages can contain single quotes in the error message, we
              replace them with double quotes to avoid errors during the insert.
            */ 
            #}
            {% set tidy_message = res.message.replace("'", '"') %}
            {% set query -%}
                insert into {{ target.database }}.{{ target.schema }}.dbt_meta__run_end_results values (
                    '{{ invocation_id }}',
                    '{{ res.node.unique_id }}',
                    '{{ res.status }}', 
                    '{{ tidy_message }}',
                    current_timestamp()
                );
            {%- endset %}
            {% do run_query(query) %}
            
        {% endfor %}
        
    {% endif %}

{% endmacro %}

Note that on-run-end hooks have special contextual information available to it (such as the results variable). I am also making use of the invocation_id variable to identify distinct dbt runs.

  1. Build your models via dbt run:

render1644369806373

  1. Query the table to double check the status of the models:

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment