The alternate title to this is perhaps "Why it's best to use string literal types with Jinja".
Let's assume we have a dbt macro like so:
-- macros/add_predicate.sql
{% macro add_predicate(arg) -%}
{%- if arg == False -%}
-- where 1 = 2
{%- else -%}
-- where 1 = 1
{%- endif -%}
{%- endmacro %}
Which is used in a model like so:
-- models/foo.sql
select 1 id
{{ add_predicate(arg = env_var("DBT_ADD")) }}
Now, let's set the env var DBT_ADD
to be False
and compile our model:
$ export DBT_ADD=False
$ dbt compile -s foo
03:51:46 Running with dbt=1.5.3
03:51:46 Registered adapter: snowflake=1.5.2
03:51:47 Found 1 model, 0 tests, 0 snapshots, 0 analyses, 323 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics, 0 groups
03:51:47
03:51:50 Concurrency: 1 threads (target='default')
03:51:50
03:51:50 Compiled node 'foo' is:
select 1 id
-- where 1 = 1
As we can see from above, we have set the env var DBT_ADD
to False
- yet in the evaluation of the conditional logic in our add_predicate
macro, it did not result in -- where 1 = 2
in our model - instead we had -- where 1 = 1
in our model instead.
Now, why is that? That is because the env var DBT_ADD
is a string literal 'False'
and the result of 'False' == False
is indeed False
. So, to fix our macro to behave like we want it to, we could do something like:
-- macros/add_predicate.sql
{% macro add_predicate(arg) -%}
{%- if arg == 'False' -%}
-- where 1 = 2
{%- else -%}
-- where 1 = 1
{%- endif -%}
{%- endmacro %}
And now, compiling the model again:
$ dbt compile -s foo
03:56:06 Running with dbt=1.5.3
03:56:07 Registered adapter: snowflake=1.5.2
03:56:08 Found 1 model, 0 tests, 0 snapshots, 0 analyses, 323 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics, 0 groups
03:56:08
03:56:11 Concurrency: 1 threads (target='default')
03:56:11
03:56:11 Compiled node 'foo' is:
select 1 id
-- where 1 = 2
Of course things can get pretty complicated very rapidly and hard to diagnose. Let's rewrite our macro going back to when we had the condition evaluate against False
(instead of string literal 'False'
in our fixed version). Additionally, we are adding extra business logic here.
Let's assume our business logic has changed somewhat - we have an audit table that we want to check against - and depending on the result in that table, set the correct predicate. I've created that table via
create or replace table development.dbt_jyeo.audit as (select 'baz' as table_name);
.
-- macros/add_predicate.sql
{% macro add_predicate(arg) -%}
{%- if arg == False -%}
-- where 1 = 2
{%- else -%}
{%- set results = run_query("select * from development.dbt_jyeo.audit where table_name = '" ~ this.identifier ~ "'") -%}
{%- if execute -%}
{%- if results.columns[0].values() | length > 0 -%}
-- where 1 = 1
{%- else -%}
-- where 1 = 2
{%- endif -%}
{%- endif -%}
{%- endif -%}
{%- endmacro %}
And then add a second bar
model to accompany our foo
model:
-- models/foo.sql
select 1 id
{{ add_predicate(arg = env_var("DBT_ADD")) }}
-- models/bar.sql
select 1 id
{{ add_predicate(arg = env_var("DBT_ADD")) }}
Let's compile our models:
$ export DBT_ADD=False && dbt compile -s foo && dbt compile -s bar
04:49:11 Running with dbt=1.5.3
04:49:11 Registered adapter: snowflake=1.5.2
04:49:12 Found 2 models, 0 tests, 0 snapshots, 0 analyses, 323 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics, 0 groups
04:49:12
04:49:15 Concurrency: 1 threads (target='default')
04:49:15
04:49:18 Compiled node 'foo' is:
select 1 id
-- where 1 = 2
04:49:22 Running with dbt=1.5.3
04:49:22 Registered adapter: snowflake=1.5.2
04:49:23 Found 2 models, 0 tests, 0 snapshots, 0 analyses, 323 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics, 0 groups
04:49:23
04:49:26 Concurrency: 1 threads (target='default')
04:49:26
04:49:28 Compiled node 'bar' is:
select 1 id
-- where 1 = 2
Here, it appears that things are working as expected - thus we think that it doesn't matter what is in our audit
table, since we would never be checking it when DBT_ADD
is set to False
.
Now, let's add a new row to our audit table.
insert into development.dbt_jyeo.audit values ('bar')
Recall that previously we only have
baz
in our audit table when it was created.
As before, we had the notion that by setting DBT_ADD
to False
, the compilation of our foo
and bar
models should still be exactly as they were yes? However:
$ dbt compile -s foo && dbt compile -s bar
04:49:56 Running with dbt=1.5.3
04:49:56 Registered adapter: snowflake=1.5.2
04:49:57 Found 2 models, 0 tests, 0 snapshots, 0 analyses, 323 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics, 0 groups
04:49:57
04:50:00 Concurrency: 1 threads (target='default')
04:50:00
04:50:03 Compiled node 'foo' is:
select 1 id
-- where 1 = 2
04:50:06 Running with dbt=1.5.3
04:50:06 Registered adapter: snowflake=1.5.2
04:50:07 Found 2 models, 0 tests, 0 snapshots, 0 analyses, 323 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics, 0 groups
04:50:07
04:50:10 Concurrency: 1 threads (target='default')
04:50:10
04:50:13 Compiled node 'bar' is:
select 1 id
-- where 1 = 1
Here we can see that this is indeed not the case (and you should already know this from above) - instead we have ALWAYS been bypassing the first condition in our macro, it's just that since we have 2 instances of -- where 1 = 2
- it made it much tricker to fully know that we have always bypassed the first condition.
Ultimately, the best recommendation is probably to do a string literal:
{% macro add_predicate(arg) -%}
{%- if arg == 'disabled' -%}
-- where 1 = 2
{%- else -%}
{%- set results = run_query("select * from development.dbt_jyeo.audit where table_name = '" ~ this.identifier ~ "'") -%}
{%- if execute -%}
{%- if results.columns[0].values() | length > 0 -%}
-- where 1 = 1
{%- else -%}
-- where 1 = 2
{%- endif -%}
{%- endif -%}
{%- endif -%}
{%- endmacro %}
Then setting the env var DBT_ADD
to disabled
as well.
$ export DBT_ADD=disabled && dbt compile -s foo && dbt compile -s bar
04:51:02 Running with dbt=1.5.3
04:51:02 Registered adapter: snowflake=1.5.2
04:51:03 Found 2 models, 0 tests, 0 snapshots, 0 analyses, 323 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics, 0 groups
04:51:03
04:51:06 Concurrency: 1 threads (target='default')
04:51:06
04:51:06 Compiled node 'foo' is:
select 1 id
-- where 1 = 2
04:51:10 Running with dbt=1.5.3
04:51:10 Registered adapter: snowflake=1.5.2
04:51:11 Found 2 models, 0 tests, 0 snapshots, 0 analyses, 323 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics, 0 groups
04:51:11
04:51:14 Concurrency: 1 threads (target='default')
04:51:14
04:51:14 Compiled node 'bar' is:
select 1 id
-- where 1 = 2
By comparing string literals, we then avoid the ambiguity of how to actually set Truthy/Falsy/Boolean types in Jinja - you can read more on that confusion on the interwebs: https://stackoverflow.com/questions/8433450/why-doesnt-my-condition-logic-work-as-expected-in-jinja2-cherrypy