Skip to content

Instantly share code, notes, and snippets.

@jeremyyeo
Last active September 1, 2023 03:47
Show Gist options
  • Save jeremyyeo/4c96bfb112a084431038d6a64dd2a413 to your computer and use it in GitHub Desktop.
Save jeremyyeo/4c96bfb112a084431038d6a64dd2a413 to your computer and use it in GitHub Desktop.
Can dbt introspect a table created in a pre-hook if we try to query from it in the body of the model? #dbt

Can dbt introspect a table created in a pre-hook if we try to query from it in the body of the model?

If we are good dbt-citizens and stay within the bounds of dbt's normal behaviour - then this is not possible. See the following example.

Make sure that the table that we are about to create in our hook, does not yet exist in our database.

drop table if exists development.dbt_jyeo.this_doesnt_exist;

image

Setup our project:

# dbt_project.yml
name: my_dbt_project
profile: all
config-version: 2
version: 1.0

models:
  my_dbt_project:
    +materialized: table
-- macros/create_new_table.sql
{% macro create_new_table() %}
    create or replace table development.dbt_jyeo.this_doesnt_exist as (select 'foo' as c)
{% endmacro %}

-- models/foo.sql
{{ config(pre_hook = "{{ create_new_table() }}") }}

{% set query %}
select distinct c from development.dbt_jyeo.this_doesnt_exist
{% endset %}

{% set results = run_query(query) %}

{% if execute %}
{% set results_list = results.columns[0].values() %}
{% else %}
{% set results_list = [] %}
{% endif %}

select '{{ results_list[0] }}' as v
  1. Run:
$ dbt run
01:59:29  Running with dbt=1.5.4
01:59:30  Registered adapter: snowflake=1.5.3
01:59:31  Found 1 model, 0 tests, 0 snapshots, 0 analyses, 819 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics, 0 groups
01:59:31  
01:59:36  Concurrency: 1 threads (target='sf')
01:59:36  
01:59:36  1 of 1 START sql table model dbt_jyeo.foo ...................................... [RUN]
01:59:39  1 of 1 ERROR creating sql table model dbt_jyeo.foo ............................. [ERROR in 2.37s]
01:59:39  
01:59:39  Finished running 1 table model in 0 hours 0 minutes and 7.85 seconds (7.85s).
01:59:39  
01:59:39  Completed with 1 error and 0 warnings:
01:59:39  
01:59:39  Database Error in model foo (models/foo.sql)
01:59:39    002003 (42S02): SQL compilation error:
01:59:39    Object 'DEVELOPMENT.DBT_JYEO.THIS_DOESNT_EXIST' does not exist or not authorized.
01:59:39  
01:59:39  Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1

As you can see, before dbt even runs the pre-hook, dbt has to compile sql model files into full and legitimate SQL text. Because the table this_doesnt_exist does not yet exist - dbt's run_query failed and it cannot compile the foo.sql model into a full and legitimate SQL. Because it cannot do that, then the model errors and thusly, our pre-hook simply doesn't even run.

Now, let's try creating the table first then rerunning our dbt project.

create table development.dbt_jyeo.this_doesnt_exist as (select 'bar' as c);
select * from development.dbt_jyeo.this_doesnt_exist;

image

Importantly, I'm making the column value 'bar' instead of 'foo' - this will be of interest later.

Rerun:

$ dbt run
02:05:12  Running with dbt=1.5.4
02:05:13  Registered adapter: snowflake=1.5.3
02:05:13  Found 1 model, 0 tests, 0 snapshots, 0 analyses, 819 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics, 0 groups
02:05:13  
02:05:19  Concurrency: 1 threads (target='sf')
02:05:19  
02:05:19  1 of 1 START sql table model dbt_jyeo.foo ...................................... [RUN]
02:05:23  1 of 1 OK created sql table model dbt_jyeo.foo ................................. [SUCCESS 1 in 4.18s]
02:05:23  
02:05:23  Finished running 1 table model in 0 hours 0 minutes and 9.51 seconds (9.51s).
02:05:23  
02:05:23  Completed successfully
02:05:23  
02:05:23  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1

Now, this did run - however if you inspect the debug logs:

03:41:18  Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'start', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10ec0f3a0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10fd337c0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10fd339a0>]}
03:41:18  Running with dbt=1.5.4
03:41:18  running dbt with arguments {'printer_width': '80', 'indirect_selection': 'eager', 'log_cache_events': 'False', 'write_json': 'True', 'partial_parse': 'True', 'cache_selected_only': 'False', 'warn_error': 'None', 'debug': 'True', 'fail_fast': 'False', 'log_path': '/Users/jeremy/src/dbt-basic/logs', 'profiles_dir': '/Users/jeremy/.dbt', 'version_check': 'True', 'use_colors': 'True', 'use_experimental_parser': 'False', 'no_print': 'None', 'quiet': 'False', 'log_format': 'default', 'static_parser': 'True', 'warn_error_options': 'WarnErrorOptions(include=[], exclude=[])', 'introspect': 'True', 'target_path': 'None', 'send_anonymous_usage_stats': 'True'}
03:41:19  Sending event: {'category': 'dbt', 'action': 'project_id', 'label': '8d3d92aa-47ca-40b0-99f9-25443813e060', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10fd02e50>]}
03:41:19  Sending event: {'category': 'dbt', 'action': 'adapter_info', 'label': '8d3d92aa-47ca-40b0-99f9-25443813e060', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x121eaa0d0>]}
03:41:19  Registered adapter: snowflake=1.5.3
03:41:19  checksum: 5d6b65bf0937b31c1cb174f5650843d5ea9953bdf3de7df3d9ed5979670d6d21, vars: {}, profile: , target: , version: 1.5.4
03:41:19  Partial parsing enabled: 0 files deleted, 0 files added, 2 files changed.
03:41:19  Partial parsing: updated file: my_dbt_project://macros/create_table.sql
03:41:19  Partial parsing: updated file: my_dbt_project://models/foo.sql
03:41:19  1603: static parser failed on foo.sql
03:41:19  1602: parser fallback to jinja rendering on foo.sql
03:41:19  Sending event: {'category': 'dbt', 'action': 'load_project', 'label': '8d3d92aa-47ca-40b0-99f9-25443813e060', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1227890d0>]}
03:41:19  Sending event: {'category': 'dbt', 'action': 'resource_counts', 'label': '8d3d92aa-47ca-40b0-99f9-25443813e060', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x122045eb0>]}
03:41:19  Found 1 model, 0 tests, 0 snapshots, 0 analyses, 819 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics, 0 groups
03:41:19  Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label': '8d3d92aa-47ca-40b0-99f9-25443813e060', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x122045f70>]}
03:41:19  
03:41:19  Acquiring new snowflake connection 'master'
03:41:19  Acquiring new snowflake connection 'list_development'
03:41:19  Using snowflake connection "list_development"
03:41:19  On list_development: /* {"app": "dbt", "dbt_version": "1.5.4", "profile_name": "all", "target_name": "sf", "connection_name": "list_development"} */
show terse schemas in database development
    limit 10000
03:41:19  Opening a new connection, currently in state init
03:41:21  SQL status: SUCCESS 274 in 2.0 seconds
03:41:21  On list_development: Close
03:41:22  Re-using an available connection from the pool (formerly list_development, now list_development_dbt_jyeo)
03:41:22  Using snowflake connection "list_development_dbt_jyeo"
03:41:22  On list_development_dbt_jyeo: /* {"app": "dbt", "dbt_version": "1.5.4", "profile_name": "all", "target_name": "sf", "connection_name": "list_development_dbt_jyeo"} */
show terse objects in development.dbt_jyeo
03:41:22  Opening a new connection, currently in state closed
03:41:23  SQL status: SUCCESS 35 in 2.0 seconds
03:41:24  On list_development_dbt_jyeo: Close
03:41:24  Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label': '8d3d92aa-47ca-40b0-99f9-25443813e060', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x122073520>]}
03:41:24  Concurrency: 1 threads (target='sf')
03:41:24  
03:41:24  Began running node model.my_dbt_project.foo
03:41:24  1 of 1 START sql table model dbt_jyeo.foo ...................................... [RUN]
03:41:24  Re-using an available connection from the pool (formerly list_development_dbt_jyeo, now model.my_dbt_project.foo)
03:41:24  Began compiling node model.my_dbt_project.foo
03:41:24  Using snowflake connection "model.my_dbt_project.foo"
03:41:24  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.5.4", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
select distinct c from development.dbt_jyeo.this_doesnt_exist
03:41:24  Opening a new connection, currently in state closed
03:41:26  SQL status: SUCCESS 1 in 2.0 seconds
03:41:26  Writing injected SQL for node "model.my_dbt_project.foo"
03:41:26  Timing info for model.my_dbt_project.foo (compile): 15:41:24.714723 => 15:41:26.454109
03:41:26  Began executing node model.my_dbt_project.foo
03:41:26  Using snowflake connection "model.my_dbt_project.foo"
03:41:26  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.5.4", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
create or replace table development.dbt_jyeo.this_doesnt_exist as (select 'foo' as c)
03:41:27  SQL status: SUCCESS 1 in 1.0 seconds
03:41:27  Writing runtime sql for node "model.my_dbt_project.foo"
03:41:27  Using snowflake connection "model.my_dbt_project.foo"
03:41:27  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.5.4", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
create or replace transient table development.dbt_jyeo.foo
         as
        (
select 'bar' as v
        );
03:41:28  SQL status: SUCCESS 1 in 1.0 seconds
03:41:28  Timing info for model.my_dbt_project.foo (execute): 15:41:26.457776 => 15:41:28.877600
03:41:28  On model.my_dbt_project.foo: Close
03:41:29  Sending event: {'category': 'dbt', 'action': 'run_model', 'label': '8d3d92aa-47ca-40b0-99f9-25443813e060', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x12269e3a0>]}
03:41:29  1 of 1 OK created sql table model dbt_jyeo.foo ................................. [SUCCESS 1 in 4.76s]
03:41:29  Finished running node model.my_dbt_project.foo
03:41:29  Connection 'master' was properly closed.
03:41:29  Connection 'model.my_dbt_project.foo' was properly closed.
03:41:29  
03:41:29  Finished running 1 table model in 0 hours 0 minutes and 9.95 seconds (9.95s).
03:41:29  Command end result
03:41:29  
03:41:29  Completed successfully
03:41:29  
03:41:29  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1
03:41:29  Command `dbt run` succeeded at 15:41:29.507139 after 11.42 seconds
03:41:29  Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10ec0f3a0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1225bd790>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1225bd730>]}
03:41:29  Flushing usage events

^ Notice how we did re-create the development.dbt_jyeo.this_doesnt_exist table with 'foo' as the value for c? And also notice how the development.dbt_jyeo.foo table was created with 'bar' instead? That is because as mentioned above, dbt has to compile the model SQL code - that means doing run_query and all that fancy stuff - before it even get's to do a pre-hook.

image

image


Now, it turns out that this can actually be achieved by doing the following. First things first, redrop our table:

image

And modify our project code:

-- macros/create_new_table.sql
{% macro create_new_table() %}
    {% do run_query("create or replace table development.dbt_jyeo.this_doesnt_exist as (select 'foo' as c)") %}
{% endmacro %}

-- models/foo.sql
{{ config(pre_hook = create_new_table()) }}

{% set query %}
select distinct c from development.dbt_jyeo.this_doesnt_exist
{% endset %}

{% set results = run_query(query) %}

{% if execute %}
{% set results_list = results.columns[0].values() %}
{% else %}
{% set results_list = [] %}
{% endif %}

select '{{ results_list[0] }}' as v

What is notable about this change is:

  1. We are calling run_query() in our create_new_table() macro.
  2. In our model pre-hook - we are calling the macro as a function directly as opposed to a string with jinja. This is not best practice and if you look at any of our documentation - calling a dbt macro in hooks always involve "string with jinja". Example.

Let's run:

$ dbt --debug run
03:37:19  Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'start', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x109217ee0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10a33c850>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10a33ca30>]}
03:37:19  Running with dbt=1.5.4
03:37:19  running dbt with arguments {'printer_width': '80', 'indirect_selection': 'eager', 'log_cache_events': 'False', 'write_json': 'True', 'partial_parse': 'True', 'cache_selected_only': 'False', 'warn_error': 'None', 'version_check': 'True', 'debug': 'True', 'log_path': '/Users/jeremy/src/dbt-basic/logs', 'profiles_dir': '/Users/jeremy/.dbt', 'fail_fast': 'False', 'use_colors': 'True', 'use_experimental_parser': 'False', 'no_print': 'None', 'quiet': 'False', 'warn_error_options': 'WarnErrorOptions(include=[], exclude=[])', 'static_parser': 'True', 'introspect': 'True', 'log_format': 'default', 'target_path': 'None', 'send_anonymous_usage_stats': 'True'}
03:37:20  Sending event: {'category': 'dbt', 'action': 'project_id', 'label': 'dacae0d6-65fa-4b2e-99ff-35fe04011221', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10a30a5e0>]}
03:37:20  Sending event: {'category': 'dbt', 'action': 'adapter_info', 'label': 'dacae0d6-65fa-4b2e-99ff-35fe04011221', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x11c4a8160>]}
03:37:20  Registered adapter: snowflake=1.5.3
03:37:20  checksum: 5d6b65bf0937b31c1cb174f5650843d5ea9953bdf3de7df3d9ed5979670d6d21, vars: {}, profile: , target: , version: 1.5.4
03:37:20  Partial parsing enabled: 1 files deleted, 0 files added, 1 files changed.
03:37:20  Partial parsing: deleted file: my_dbt_project://models/bar.sql
03:37:20  Partial parsing: updated file: my_dbt_project://models/foo.sql
03:37:20  1603: static parser failed on foo.sql
03:37:20  1602: parser fallback to jinja rendering on foo.sql
03:37:20  Sending event: {'category': 'dbt', 'action': 'load_project', 'label': 'dacae0d6-65fa-4b2e-99ff-35fe04011221', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x11cc670d0>]}
03:37:20  Sending event: {'category': 'dbt', 'action': 'resource_counts', 'label': 'dacae0d6-65fa-4b2e-99ff-35fe04011221', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x11c647490>]}
03:37:20  Found 1 model, 0 tests, 0 snapshots, 0 analyses, 819 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics, 0 groups
03:37:20  Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label': 'dacae0d6-65fa-4b2e-99ff-35fe04011221', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x11c6440d0>]}
03:37:20  
03:37:20  Acquiring new snowflake connection 'master'
03:37:20  Acquiring new snowflake connection 'list_development'
03:37:20  Using snowflake connection "list_development"
03:37:20  On list_development: /* {"app": "dbt", "dbt_version": "1.5.4", "profile_name": "all", "target_name": "sf", "connection_name": "list_development"} */
show terse schemas in database development
    limit 10000
03:37:20  Opening a new connection, currently in state init
03:37:22  SQL status: SUCCESS 274 in 2.0 seconds
03:37:22  On list_development: Close
03:37:23  Re-using an available connection from the pool (formerly list_development, now list_development_dbt_jyeo)
03:37:23  Using snowflake connection "list_development_dbt_jyeo"
03:37:23  On list_development_dbt_jyeo: /* {"app": "dbt", "dbt_version": "1.5.4", "profile_name": "all", "target_name": "sf", "connection_name": "list_development_dbt_jyeo"} */
show terse objects in development.dbt_jyeo
03:37:23  Opening a new connection, currently in state closed
03:37:25  SQL status: SUCCESS 34 in 2.0 seconds
03:37:25  On list_development_dbt_jyeo: Close
03:37:25  Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label': 'dacae0d6-65fa-4b2e-99ff-35fe04011221', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x11cc05730>]}
03:37:25  Concurrency: 1 threads (target='sf')
03:37:25  
03:37:25  Began running node model.my_dbt_project.foo
03:37:25  1 of 1 START sql table model dbt_jyeo.foo ...................................... [RUN]
03:37:25  Re-using an available connection from the pool (formerly list_development_dbt_jyeo, now model.my_dbt_project.foo)
03:37:25  Began compiling node model.my_dbt_project.foo
03:37:25  Using snowflake connection "model.my_dbt_project.foo"
03:37:25  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.5.4", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
create or replace table development.dbt_jyeo.this_doesnt_exist as (select 'foo' as c)
03:37:25  Opening a new connection, currently in state closed
03:37:28  SQL status: SUCCESS 1 in 3.0 seconds
03:37:28  Using snowflake connection "model.my_dbt_project.foo"
03:37:28  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.5.4", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
select distinct c from development.dbt_jyeo.this_doesnt_exist
03:37:29  SQL status: SUCCESS 1 in 0.0 seconds
03:37:29  Writing injected SQL for node "model.my_dbt_project.foo"
03:37:29  Timing info for model.my_dbt_project.foo (compile): 15:37:25.832885 => 15:37:29.100113
03:37:29  Began executing node model.my_dbt_project.foo
03:37:29  Writing runtime sql for node "model.my_dbt_project.foo"
03:37:29  Using snowflake connection "model.my_dbt_project.foo"
03:37:29  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.5.4", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
create or replace transient table development.dbt_jyeo.foo
         as
        (
select 'foo' as v
        );
03:37:30  SQL status: SUCCESS 1 in 1.0 seconds
03:37:30  Timing info for model.my_dbt_project.foo (execute): 15:37:29.103359 => 15:37:30.335749
03:37:30  On model.my_dbt_project.foo: Close
03:37:31  Sending event: {'category': 'dbt', 'action': 'run_model', 'label': 'dacae0d6-65fa-4b2e-99ff-35fe04011221', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x11c650970>]}
03:37:31  1 of 1 OK created sql table model dbt_jyeo.foo ................................. [SUCCESS 1 in 5.18s]
03:37:31  Finished running node model.my_dbt_project.foo
03:37:31  Connection 'master' was properly closed.
03:37:31  Connection 'model.my_dbt_project.foo' was properly closed.
03:37:31  
03:37:31  Finished running 1 table model in 0 hours 0 minutes and 10.28 seconds (10.28s).
03:37:31  Command end result
03:37:31  
03:37:31  Completed successfully
03:37:31  
03:37:31  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1
03:37:31  Command `dbt run` succeeded at 15:37:31.058514 after 11.69 seconds
03:37:31  Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x109217ee0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x11c650970>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x11c48de20>]}
03:37:31  Flushing usage events

By doing this, we have caused dbt to run our create table query via do run_query() prior to dbt having to introspect the table itself. Thus we no longer error with Object 'DEVELOPMENT.DBT_JYEO.THIS_DOESNT_EXIST' does not exist or not authorized. as we would in the case when we were good dbt-citizens.

If you pay close attention to the previous debug logs, you will find that the sequence of operations has changed from

introspect this_doesnt_exist
create this_doesnt_exist
create foo

to

create this_doesnt_exist
introspect this_doesnt_exist
create foo

This also means that the model can now introspect the very thing created/updated in the hook itself.

P.s. I make not guarantees as to how long this will work nor if it would work without errors in the dbt Cloud IDE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment