Can dbt introspect a table created in a pre-hook if we try to query from it in the body of the model?
If we are good dbt-citizens and stay within the bounds of dbt's normal behaviour - then this is not possible. See the following example.
Make sure that the table that we are about to create in our hook, does not yet exist in our database.
drop table if exists development.dbt_jyeo.this_doesnt_exist;
Setup our project:
# dbt_project.yml
name: my_dbt_project
profile: all
config-version: 2
version: 1.0
models:
my_dbt_project:
+materialized: table
-- macros/create_new_table.sql
{% macro create_new_table() %}
create or replace table development.dbt_jyeo.this_doesnt_exist as (select 'foo' as c)
{% endmacro %}
-- models/foo.sql
{{ config(pre_hook = "{{ create_new_table() }}") }}
{% set query %}
select distinct c from development.dbt_jyeo.this_doesnt_exist
{% endset %}
{% set results = run_query(query) %}
{% if execute %}
{% set results_list = results.columns[0].values() %}
{% else %}
{% set results_list = [] %}
{% endif %}
select '{{ results_list[0] }}' as v
- Run:
$ dbt run
01:59:29 Running with dbt=1.5.4
01:59:30 Registered adapter: snowflake=1.5.3
01:59:31 Found 1 model, 0 tests, 0 snapshots, 0 analyses, 819 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics, 0 groups
01:59:31
01:59:36 Concurrency: 1 threads (target='sf')
01:59:36
01:59:36 1 of 1 START sql table model dbt_jyeo.foo ...................................... [RUN]
01:59:39 1 of 1 ERROR creating sql table model dbt_jyeo.foo ............................. [ERROR in 2.37s]
01:59:39
01:59:39 Finished running 1 table model in 0 hours 0 minutes and 7.85 seconds (7.85s).
01:59:39
01:59:39 Completed with 1 error and 0 warnings:
01:59:39
01:59:39 Database Error in model foo (models/foo.sql)
01:59:39 002003 (42S02): SQL compilation error:
01:59:39 Object 'DEVELOPMENT.DBT_JYEO.THIS_DOESNT_EXIST' does not exist or not authorized.
01:59:39
01:59:39 Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1
As you can see, before dbt even runs the pre-hook, dbt has to compile sql model files into full and legitimate SQL text. Because the table this_doesnt_exist
does not yet exist - dbt's run_query
failed and it cannot compile the foo.sql
model into a full and legitimate SQL. Because it cannot do that, then the model errors and thusly, our pre-hook simply doesn't even run.
Now, let's try creating the table first then rerunning our dbt project.
create table development.dbt_jyeo.this_doesnt_exist as (select 'bar' as c);
select * from development.dbt_jyeo.this_doesnt_exist;
Importantly, I'm making the column value 'bar' instead of 'foo' - this will be of interest later.
Rerun:
$ dbt run
02:05:12 Running with dbt=1.5.4
02:05:13 Registered adapter: snowflake=1.5.3
02:05:13 Found 1 model, 0 tests, 0 snapshots, 0 analyses, 819 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics, 0 groups
02:05:13
02:05:19 Concurrency: 1 threads (target='sf')
02:05:19
02:05:19 1 of 1 START sql table model dbt_jyeo.foo ...................................... [RUN]
02:05:23 1 of 1 OK created sql table model dbt_jyeo.foo ................................. [SUCCESS 1 in 4.18s]
02:05:23
02:05:23 Finished running 1 table model in 0 hours 0 minutes and 9.51 seconds (9.51s).
02:05:23
02:05:23 Completed successfully
02:05:23
02:05:23 Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1
Now, this did run - however if you inspect the debug logs:
03:41:18 Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'start', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10ec0f3a0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10fd337c0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10fd339a0>]}
03:41:18 Running with dbt=1.5.4
03:41:18 running dbt with arguments {'printer_width': '80', 'indirect_selection': 'eager', 'log_cache_events': 'False', 'write_json': 'True', 'partial_parse': 'True', 'cache_selected_only': 'False', 'warn_error': 'None', 'debug': 'True', 'fail_fast': 'False', 'log_path': '/Users/jeremy/src/dbt-basic/logs', 'profiles_dir': '/Users/jeremy/.dbt', 'version_check': 'True', 'use_colors': 'True', 'use_experimental_parser': 'False', 'no_print': 'None', 'quiet': 'False', 'log_format': 'default', 'static_parser': 'True', 'warn_error_options': 'WarnErrorOptions(include=[], exclude=[])', 'introspect': 'True', 'target_path': 'None', 'send_anonymous_usage_stats': 'True'}
03:41:19 Sending event: {'category': 'dbt', 'action': 'project_id', 'label': '8d3d92aa-47ca-40b0-99f9-25443813e060', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10fd02e50>]}
03:41:19 Sending event: {'category': 'dbt', 'action': 'adapter_info', 'label': '8d3d92aa-47ca-40b0-99f9-25443813e060', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x121eaa0d0>]}
03:41:19 Registered adapter: snowflake=1.5.3
03:41:19 checksum: 5d6b65bf0937b31c1cb174f5650843d5ea9953bdf3de7df3d9ed5979670d6d21, vars: {}, profile: , target: , version: 1.5.4
03:41:19 Partial parsing enabled: 0 files deleted, 0 files added, 2 files changed.
03:41:19 Partial parsing: updated file: my_dbt_project://macros/create_table.sql
03:41:19 Partial parsing: updated file: my_dbt_project://models/foo.sql
03:41:19 1603: static parser failed on foo.sql
03:41:19 1602: parser fallback to jinja rendering on foo.sql
03:41:19 Sending event: {'category': 'dbt', 'action': 'load_project', 'label': '8d3d92aa-47ca-40b0-99f9-25443813e060', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1227890d0>]}
03:41:19 Sending event: {'category': 'dbt', 'action': 'resource_counts', 'label': '8d3d92aa-47ca-40b0-99f9-25443813e060', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x122045eb0>]}
03:41:19 Found 1 model, 0 tests, 0 snapshots, 0 analyses, 819 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics, 0 groups
03:41:19 Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label': '8d3d92aa-47ca-40b0-99f9-25443813e060', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x122045f70>]}
03:41:19
03:41:19 Acquiring new snowflake connection 'master'
03:41:19 Acquiring new snowflake connection 'list_development'
03:41:19 Using snowflake connection "list_development"
03:41:19 On list_development: /* {"app": "dbt", "dbt_version": "1.5.4", "profile_name": "all", "target_name": "sf", "connection_name": "list_development"} */
show terse schemas in database development
limit 10000
03:41:19 Opening a new connection, currently in state init
03:41:21 SQL status: SUCCESS 274 in 2.0 seconds
03:41:21 On list_development: Close
03:41:22 Re-using an available connection from the pool (formerly list_development, now list_development_dbt_jyeo)
03:41:22 Using snowflake connection "list_development_dbt_jyeo"
03:41:22 On list_development_dbt_jyeo: /* {"app": "dbt", "dbt_version": "1.5.4", "profile_name": "all", "target_name": "sf", "connection_name": "list_development_dbt_jyeo"} */
show terse objects in development.dbt_jyeo
03:41:22 Opening a new connection, currently in state closed
03:41:23 SQL status: SUCCESS 35 in 2.0 seconds
03:41:24 On list_development_dbt_jyeo: Close
03:41:24 Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label': '8d3d92aa-47ca-40b0-99f9-25443813e060', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x122073520>]}
03:41:24 Concurrency: 1 threads (target='sf')
03:41:24
03:41:24 Began running node model.my_dbt_project.foo
03:41:24 1 of 1 START sql table model dbt_jyeo.foo ...................................... [RUN]
03:41:24 Re-using an available connection from the pool (formerly list_development_dbt_jyeo, now model.my_dbt_project.foo)
03:41:24 Began compiling node model.my_dbt_project.foo
03:41:24 Using snowflake connection "model.my_dbt_project.foo"
03:41:24 On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.5.4", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
select distinct c from development.dbt_jyeo.this_doesnt_exist
03:41:24 Opening a new connection, currently in state closed
03:41:26 SQL status: SUCCESS 1 in 2.0 seconds
03:41:26 Writing injected SQL for node "model.my_dbt_project.foo"
03:41:26 Timing info for model.my_dbt_project.foo (compile): 15:41:24.714723 => 15:41:26.454109
03:41:26 Began executing node model.my_dbt_project.foo
03:41:26 Using snowflake connection "model.my_dbt_project.foo"
03:41:26 On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.5.4", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
create or replace table development.dbt_jyeo.this_doesnt_exist as (select 'foo' as c)
03:41:27 SQL status: SUCCESS 1 in 1.0 seconds
03:41:27 Writing runtime sql for node "model.my_dbt_project.foo"
03:41:27 Using snowflake connection "model.my_dbt_project.foo"
03:41:27 On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.5.4", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
create or replace transient table development.dbt_jyeo.foo
as
(
select 'bar' as v
);
03:41:28 SQL status: SUCCESS 1 in 1.0 seconds
03:41:28 Timing info for model.my_dbt_project.foo (execute): 15:41:26.457776 => 15:41:28.877600
03:41:28 On model.my_dbt_project.foo: Close
03:41:29 Sending event: {'category': 'dbt', 'action': 'run_model', 'label': '8d3d92aa-47ca-40b0-99f9-25443813e060', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x12269e3a0>]}
03:41:29 1 of 1 OK created sql table model dbt_jyeo.foo ................................. [SUCCESS 1 in 4.76s]
03:41:29 Finished running node model.my_dbt_project.foo
03:41:29 Connection 'master' was properly closed.
03:41:29 Connection 'model.my_dbt_project.foo' was properly closed.
03:41:29
03:41:29 Finished running 1 table model in 0 hours 0 minutes and 9.95 seconds (9.95s).
03:41:29 Command end result
03:41:29
03:41:29 Completed successfully
03:41:29
03:41:29 Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1
03:41:29 Command `dbt run` succeeded at 15:41:29.507139 after 11.42 seconds
03:41:29 Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10ec0f3a0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1225bd790>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1225bd730>]}
03:41:29 Flushing usage events
^ Notice how we did re-create the development.dbt_jyeo.this_doesnt_exist
table with 'foo'
as the value for c
? And also notice how the development.dbt_jyeo.foo
table was created with 'bar'
instead? That is because as mentioned above, dbt has to compile the model SQL code - that means doing run_query and all that fancy stuff - before it even get's to do a pre-hook.
Now, it turns out that this can actually be achieved by doing the following. First things first, redrop our table:
And modify our project code:
-- macros/create_new_table.sql
{% macro create_new_table() %}
{% do run_query("create or replace table development.dbt_jyeo.this_doesnt_exist as (select 'foo' as c)") %}
{% endmacro %}
-- models/foo.sql
{{ config(pre_hook = create_new_table()) }}
{% set query %}
select distinct c from development.dbt_jyeo.this_doesnt_exist
{% endset %}
{% set results = run_query(query) %}
{% if execute %}
{% set results_list = results.columns[0].values() %}
{% else %}
{% set results_list = [] %}
{% endif %}
select '{{ results_list[0] }}' as v
What is notable about this change is:
- We are calling
run_query()
in ourcreate_new_table()
macro. - In our model pre-hook - we are calling the macro as a function directly as opposed to a string with jinja. This is not best practice and if you look at any of our documentation - calling a dbt macro in hooks always involve "string with jinja". Example.
Let's run:
$ dbt --debug run
03:37:19 Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'start', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x109217ee0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10a33c850>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10a33ca30>]}
03:37:19 Running with dbt=1.5.4
03:37:19 running dbt with arguments {'printer_width': '80', 'indirect_selection': 'eager', 'log_cache_events': 'False', 'write_json': 'True', 'partial_parse': 'True', 'cache_selected_only': 'False', 'warn_error': 'None', 'version_check': 'True', 'debug': 'True', 'log_path': '/Users/jeremy/src/dbt-basic/logs', 'profiles_dir': '/Users/jeremy/.dbt', 'fail_fast': 'False', 'use_colors': 'True', 'use_experimental_parser': 'False', 'no_print': 'None', 'quiet': 'False', 'warn_error_options': 'WarnErrorOptions(include=[], exclude=[])', 'static_parser': 'True', 'introspect': 'True', 'log_format': 'default', 'target_path': 'None', 'send_anonymous_usage_stats': 'True'}
03:37:20 Sending event: {'category': 'dbt', 'action': 'project_id', 'label': 'dacae0d6-65fa-4b2e-99ff-35fe04011221', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10a30a5e0>]}
03:37:20 Sending event: {'category': 'dbt', 'action': 'adapter_info', 'label': 'dacae0d6-65fa-4b2e-99ff-35fe04011221', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x11c4a8160>]}
03:37:20 Registered adapter: snowflake=1.5.3
03:37:20 checksum: 5d6b65bf0937b31c1cb174f5650843d5ea9953bdf3de7df3d9ed5979670d6d21, vars: {}, profile: , target: , version: 1.5.4
03:37:20 Partial parsing enabled: 1 files deleted, 0 files added, 1 files changed.
03:37:20 Partial parsing: deleted file: my_dbt_project://models/bar.sql
03:37:20 Partial parsing: updated file: my_dbt_project://models/foo.sql
03:37:20 1603: static parser failed on foo.sql
03:37:20 1602: parser fallback to jinja rendering on foo.sql
03:37:20 Sending event: {'category': 'dbt', 'action': 'load_project', 'label': 'dacae0d6-65fa-4b2e-99ff-35fe04011221', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x11cc670d0>]}
03:37:20 Sending event: {'category': 'dbt', 'action': 'resource_counts', 'label': 'dacae0d6-65fa-4b2e-99ff-35fe04011221', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x11c647490>]}
03:37:20 Found 1 model, 0 tests, 0 snapshots, 0 analyses, 819 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics, 0 groups
03:37:20 Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label': 'dacae0d6-65fa-4b2e-99ff-35fe04011221', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x11c6440d0>]}
03:37:20
03:37:20 Acquiring new snowflake connection 'master'
03:37:20 Acquiring new snowflake connection 'list_development'
03:37:20 Using snowflake connection "list_development"
03:37:20 On list_development: /* {"app": "dbt", "dbt_version": "1.5.4", "profile_name": "all", "target_name": "sf", "connection_name": "list_development"} */
show terse schemas in database development
limit 10000
03:37:20 Opening a new connection, currently in state init
03:37:22 SQL status: SUCCESS 274 in 2.0 seconds
03:37:22 On list_development: Close
03:37:23 Re-using an available connection from the pool (formerly list_development, now list_development_dbt_jyeo)
03:37:23 Using snowflake connection "list_development_dbt_jyeo"
03:37:23 On list_development_dbt_jyeo: /* {"app": "dbt", "dbt_version": "1.5.4", "profile_name": "all", "target_name": "sf", "connection_name": "list_development_dbt_jyeo"} */
show terse objects in development.dbt_jyeo
03:37:23 Opening a new connection, currently in state closed
03:37:25 SQL status: SUCCESS 34 in 2.0 seconds
03:37:25 On list_development_dbt_jyeo: Close
03:37:25 Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label': 'dacae0d6-65fa-4b2e-99ff-35fe04011221', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x11cc05730>]}
03:37:25 Concurrency: 1 threads (target='sf')
03:37:25
03:37:25 Began running node model.my_dbt_project.foo
03:37:25 1 of 1 START sql table model dbt_jyeo.foo ...................................... [RUN]
03:37:25 Re-using an available connection from the pool (formerly list_development_dbt_jyeo, now model.my_dbt_project.foo)
03:37:25 Began compiling node model.my_dbt_project.foo
03:37:25 Using snowflake connection "model.my_dbt_project.foo"
03:37:25 On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.5.4", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
create or replace table development.dbt_jyeo.this_doesnt_exist as (select 'foo' as c)
03:37:25 Opening a new connection, currently in state closed
03:37:28 SQL status: SUCCESS 1 in 3.0 seconds
03:37:28 Using snowflake connection "model.my_dbt_project.foo"
03:37:28 On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.5.4", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
select distinct c from development.dbt_jyeo.this_doesnt_exist
03:37:29 SQL status: SUCCESS 1 in 0.0 seconds
03:37:29 Writing injected SQL for node "model.my_dbt_project.foo"
03:37:29 Timing info for model.my_dbt_project.foo (compile): 15:37:25.832885 => 15:37:29.100113
03:37:29 Began executing node model.my_dbt_project.foo
03:37:29 Writing runtime sql for node "model.my_dbt_project.foo"
03:37:29 Using snowflake connection "model.my_dbt_project.foo"
03:37:29 On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.5.4", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
create or replace transient table development.dbt_jyeo.foo
as
(
select 'foo' as v
);
03:37:30 SQL status: SUCCESS 1 in 1.0 seconds
03:37:30 Timing info for model.my_dbt_project.foo (execute): 15:37:29.103359 => 15:37:30.335749
03:37:30 On model.my_dbt_project.foo: Close
03:37:31 Sending event: {'category': 'dbt', 'action': 'run_model', 'label': 'dacae0d6-65fa-4b2e-99ff-35fe04011221', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x11c650970>]}
03:37:31 1 of 1 OK created sql table model dbt_jyeo.foo ................................. [SUCCESS 1 in 5.18s]
03:37:31 Finished running node model.my_dbt_project.foo
03:37:31 Connection 'master' was properly closed.
03:37:31 Connection 'model.my_dbt_project.foo' was properly closed.
03:37:31
03:37:31 Finished running 1 table model in 0 hours 0 minutes and 10.28 seconds (10.28s).
03:37:31 Command end result
03:37:31
03:37:31 Completed successfully
03:37:31
03:37:31 Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1
03:37:31 Command `dbt run` succeeded at 15:37:31.058514 after 11.69 seconds
03:37:31 Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x109217ee0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x11c650970>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x11c48de20>]}
03:37:31 Flushing usage events
By doing this, we have caused dbt to run our create table query via do run_query()
prior to dbt having to introspect the table itself. Thus we no longer error with Object 'DEVELOPMENT.DBT_JYEO.THIS_DOESNT_EXIST' does not exist or not authorized.
as we would in the case when we were good dbt-citizens.
If you pay close attention to the previous debug logs, you will find that the sequence of operations has changed from
introspect this_doesnt_exist
create this_doesnt_exist
create foo
to
create this_doesnt_exist
introspect this_doesnt_exist
create foo
This also means that the model can now introspect the very thing created/updated in the hook itself.
P.s. I make not guarantees as to how long this will work nor if it would work without errors in the dbt Cloud IDE.