Skip to content

Instantly share code, notes, and snippets.

@jeremyyeo
Last active February 2, 2024 06:23
Show Gist options
  • Save jeremyyeo/fcdc4e40967435c3556612b9709fdcf4 to your computer and use it in GitHub Desktop.
Save jeremyyeo/fcdc4e40967435c3556612b9709fdcf4 to your computer and use it in GitHub Desktop.
Which dbt nodes respect the generate_schema_name macro? #dbt

Which dbt nodes respect the generate_schema_name macro?

dbt has many types of "nodes"/"resources" - e.g. models, sources, seeds - so which of them actually respect the generate_schema_name() macro? Let's have a look.

The following is tested using:

Core:
  - installed: 1.7.7
  - latest:    1.7.7 - Up to date!
Plugins:
  - postgres:   1.7.7 - Up to date!

With a profiles.yml that resolves target.schema to be public:

# ~/.dbt/profiles.yml
postgres:
  target: dev
  outputs:
    dev:
      type: postgres
      host: localhost
      user: root
      password: password
      database: postgres
      schema: public
      port: 5432

And a dbt_project.yml like so:

# dbt_project.yml
name: my_dbt_project
profile: postgres
config-version: 2
version: "1.0.0"

models:
  my_dbt_project:
    +materialized: table

First let's add a generate_schema_name() macro like so:

-- macros/gsn.sql
{% macro generate_schema_name(custom_schema_name, node) -%}
  not_public
{%- endmacro %}

There is no "business logic" we're testing here - we're simply testing which of the various dbt node types will actually have their schemas be set to not_public as opposed to public.

Models

Expand...
-- models/foo.sql
select 1 id
$ dbt --debug run -s foo
...
05:56:49  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.7.7", "profile_name": "all", "target_name": "pg-local", "node_id": "model.my_dbt_project.foo"} */ 
  create  table "postgres"."not_public"."foo__dbt_tmp"
    as
  (
    select 1 id
  );
...

✅ - yes the schema of models respect the macro.

Seeds

Expand...
# seeds/people.csv
id
1
$ dbt --debug seed
...
05:58:53  On seed.my_dbt_project.people: /* {"app": "dbt", "dbt_version": "1.7.7", "profile_name": "all", "target_name": "pg-local", "node_id": "seed.my_dbt_project.people"} */

    create table "postgres"."not_public"."people" ("id" integer)
  
05:58:53  SQL status: CREATE TABLE in 0.0 seconds
...

✅ - yes the schema of seeds respect the macro.

Snapshots

Expand...
-- snapshots/snappy.sql
{% snapshot snappy %}

{{
    config(
      target_schema='snapshots',
      unique_key='id',
      strategy='check',
      check_cols='all'
    )
}}

select 1 id

{% endsnapshot %}
$ dbt --debug snapshot
...
06:11:43  On snapshot.my_dbt_project.snappy: /* {"app": "dbt", "dbt_version": "1.7.7", "profile_name": "all", "target_name": "pg-local", "node_id": "snapshot.my_dbt_project.snappy"} */
  create  table "postgres"."snapshots"."snappy"
    as

  (
    
    select *,
        md5(coalesce(cast(id as varchar ), '')
         || '|' || coalesce(cast(now()::timestamp without time zone as varchar ), '')
        ) as dbt_scd_id,
        now()::timestamp without time zone as dbt_updated_at,
        now()::timestamp without time zone as dbt_valid_from,
        nullif(now()::timestamp without time zone, now()::timestamp without time zone) as dbt_valid_to
    from (

select 1 id

    ) sbq

  );
...

❌ - We can see from above that snapshots do not use the generate_schema_name macro at all - it simply takes it schema from the target_schema config.

Sources

Expand...
# models/sources.yml
version: 2
sources:
  - name: public
    tables:
      - name: my_source
-- models/from_source.sql
select * from {{ source('public', 'my_source') }}

Sources do nothing by themselves so to see anything we need to use it in a model.

$ dbt --debug run -s from_source
...
06:01:57  On model.my_dbt_project.from_source: /* {"app": "dbt", "dbt_version": "1.7.7", "profile_name": "all", "target_name": "pg-local", "node_id": "model.my_dbt_project.from_source"} */
  create  table "postgres"."not_public"."from_source__dbt_tmp"
    as
  (
    select * from "postgres"."public"."my_source"
  );
...

❌ - We can see from above that the source's schema resolved to public instead of not_public - i.e. sources do not make use of the generate_schema_name macro.

Test tables as a result of using the store_failures option

Expand...
# models/schema.yml
version: 2
models:
  - name: foo
    columns:
      - name: id
        tests:
          - not_null
$ dbt --debug test --store-failures
...
06:07:36  On test.my_dbt_project.not_null_foo_id.f099b1e59c: /* {"app": "dbt", "dbt_version": "1.7.7", "profile_name": "all", "target_name": "pg-local", "node_id": "test.my_dbt_project.not_null_foo_id.f099b1e59c"} */ 

  create  table "postgres"."not_public"."not_null_foo_id"
  
    as 
  (

select *
from "postgres"."not_public"."foo"
where id is null

  );
  
06:07:36  SQL status: SELECT 0 in 0.0 seconds
...

✅ - yes the schema of the resulting test table respects the macro.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment