Skip to content

Instantly share code, notes, and snippets.

@jeremyyeo
Last active June 9, 2023 00:16
Show Gist options
  • Save jeremyyeo/5e3bae4e7a2ff6c6b554880c0d8d0e86 to your computer and use it in GitHub Desktop.
Save jeremyyeo/5e3bae4e7a2ff6c6b554880c0d8d0e86 to your computer and use it in GitHub Desktop.
Model name validation #dbt

Model name validation

Let's assume we have a dbt project with a models folder structure as follows:

models/
├── dimensions/
│   └── dim_users.sql
├── staging/
│   └── stg_users.sql
└── fct_users.sql

And we want to validate that each model in the dimensions/ folder has a name that starts with dim_ and each model in the staging/ folder has a name that starts with stg_. We also do not care too much about what other models are named if they are outside of those 2 folders.

Using a pre-hook macro

We can accomplish this by running a hook just before a model is executed (run).

  1. Add a macro that we will use on our model pre-hook:
-- macros/validate_model.sql

{% macro validate_model(this, prefix) %}
  
  {% if execute %}

    {% for node in graph.nodes.values() %}
      {% if node.name == this.identifier %}

        {% if node.name.startswith(prefix) %}

          {% set message = "model in path <" ~ node.original_file_path ~ "> contains the right prefix of <" ~ prefix ~ ">" %}
          {% do log(message, True) %}

        {% else %}

          {% set message = "model in path <" ~ node.original_file_path ~ "> DOES NOT CONTAIN the right prefix of <" ~ prefix ~ ">" %}
          {% do exceptions.raise_compiler_error(message) %}

        {% endif %}

      {% endif %}
    {% endfor %}

  {% endif %}

{% endmacro %}

The macro takes in a relation with the this variable that refers to the current model and a prefix which we will set to a string representing the prefix we want our models to have.

The macro also uses the graph context variable which contains a dictionary of all our nodes / models and it's attributes (such as the complete file path to the model, the type of materialization, etc).

  1. Let's use our macro in a pre-hook.
# dbt_project.yml

name: "my_project"

...

models:
  my_project:
    +materialized: table
    dimensions:
      +pre-hook: "{{ validate_model(this, 'dim_') }}"
    staging:
      +pre-hook: "{{ validate_model(this, 'stg_') }}"

...
  1. Let's give our project a run.
$ dbt run

01:52:59  Running with dbt=1.0.2
01:53:00  Found 3 models, 0 tests, 0 snapshots, 0 analyses, 180 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
01:53:00  
01:53:05  Concurrency: 1 threads (target='dev')
01:53:05  
01:53:05  1 of 3 START table model dbt_jyeo.dim_users..................................... [RUN]
01:53:05  model in path <models/dimensions/dim_users.sql> contains the right prefix of <dim_>
01:53:09  1 of 3 OK created table model dbt_jyeo.dim_users................................ [SUCCESS 1 in 3.38s]
01:53:09  2 of 3 START table model dbt_jyeo.fct_users..................................... [RUN]
01:53:12  2 of 3 OK created table model dbt_jyeo.fct_users................................ [SUCCESS 1 in 3.37s]
01:53:12  3 of 3 START table model dbt_jyeo.stg_users..................................... [RUN]
01:53:12  model in path <models/staging/stg_users.sql> contains the right prefix of <stg_>
01:53:15  3 of 3 OK created table model dbt_jyeo.stg_users................................ [SUCCESS 1 in 3.25s]
01:53:15  
01:53:15  Finished running 3 table models in 15.70s.
01:53:15  
01:53:15  Completed successfully
01:53:15  
01:53:15  Done. PASS=3 WARN=0 ERROR=0 SKIP=0 TOTAL=3

Everything worked as expected. Let's now try to trigger an error.

  1. Add a customers.sql model to the dimensions/ folder with any arbitrary content in it (all the models in this example just have select 1 as id in it). Our models folder structure should now look like:
models/
├── dimensions/
│   ├── customers.sql
│   └── dim_users.sql
├── staging/
│   └── stg_users.sql
└── fct_users.sql
  1. Let's give our project another run.
$ dbt run

02:03:20  Running with dbt=1.0.2
02:03:21  Found 4 models, 0 tests, 0 snapshots, 0 analyses, 180 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
02:03:21  
02:03:27  Concurrency: 1 threads (target='dev')
02:03:27  
02:03:27  1 of 4 START table model dbt_jyeo.customers..................................... [RUN]
02:03:27  1 of 4 ERROR creating table model dbt_jyeo.customers............................ [ERROR in 0.04s]
02:03:27  2 of 4 START table model dbt_jyeo.dim_users..................................... [RUN]
02:03:27  model in path <models/dimensions/dim_users.sql> contains the right prefix of <dim_>
02:03:30  2 of 4 OK created table model dbt_jyeo.dim_users................................ [SUCCESS 1 in 3.68s]
02:03:30  3 of 4 START table model dbt_jyeo.fct_users..................................... [RUN]
02:03:34  3 of 4 OK created table model dbt_jyeo.fct_users................................ [SUCCESS 1 in 3.60s]
02:03:34  4 of 4 START table model dbt_jyeo.stg_users..................................... [RUN]
02:03:34  model in path <models/staging/stg_users.sql> contains the right prefix of <stg_>
02:03:37  4 of 4 OK created table model dbt_jyeo.stg_users................................ [SUCCESS 1 in 3.41s]
02:03:37  
02:03:37  Finished running 4 table models in 16.66s.
02:03:37  
02:03:37  Completed with 1 error and 0 warnings:
02:03:37  
02:03:37  Compilation Error in model customers (models/dimensions/customers.sql)
02:03:37    model in path <models/dimensions/customers.sql> DOES NOT CONTAIN the right prefix of <dim_>
02:03:37    
02:03:37    > in macro validate_model (macros/validate_model.sql)
02:03:37    > called by macro run_hooks (macros/materializations/hooks.sql)
02:03:37    > called by macro materialization_table_snowflake (macros/materializations/table.sql)
02:03:37    > called by model customers (models/dimensions/customers.sql)
02:03:37  
02:03:37  Done. PASS=3 WARN=0 ERROR=1 SKIP=0 TOTAL=4

We now see that the customers.sql model raised an error (expectedly) because it did not contain the right prefix.

Using a run-operation macro

Let's now see how we can validate our models using a run-operation - this can come in handy if you want to validate every model first before you actually proceed to dbt run.

  1. Add a macro that we will use with our run-operation:
-- macros/validate_all_models.sql

{% macro validate_all_model() %}
  


{% endmacro %}

-TBD-

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment