Skip to content

Instantly share code, notes, and snippets.

@jeremyyeo
Last active July 24, 2023 05:00
Show Gist options
  • Save jeremyyeo/e97dbc79b536e2ae4a72d734fedb1812 to your computer and use it in GitHub Desktop.
Save jeremyyeo/e97dbc79b536e2ae4a72d734fedb1812 to your computer and use it in GitHub Desktop.
Why it's best not to use Truthy/Falsy/Boolean types with Jinja #dbt

Why it's best not to use Truthy/Falsy/Boolean types with Jinja

The alternate title to this is perhaps "Why it's best to use string literal types with Jinja".

Let's assume we have a dbt macro like so:

-- macros/add_predicate.sql
{% macro add_predicate(arg) -%}
    {%- if arg == False -%}
        -- where 1 = 2
    {%- else -%}
        -- where 1 = 1
    {%- endif -%}
{%- endmacro %}

Which is used in a model like so:

-- models/foo.sql
select 1 id
{{ add_predicate(arg = env_var("DBT_ADD")) }}

Now, let's set the env var DBT_ADD to be False and compile our model:

$ export DBT_ADD=False
$ dbt compile -s foo
03:51:46  Running with dbt=1.5.3
03:51:46  Registered adapter: snowflake=1.5.2
03:51:47  Found 1 model, 0 tests, 0 snapshots, 0 analyses, 323 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics, 0 groups
03:51:47  
03:51:50  Concurrency: 1 threads (target='default')
03:51:50  
03:51:50  Compiled node 'foo' is:
select 1 id
-- where 1 = 1

As we can see from above, we have set the env var DBT_ADD to False - yet in the evaluation of the conditional logic in our add_predicate macro, it did not result in -- where 1 = 2 in our model - instead we had -- where 1 = 1 in our model instead.

Now, why is that? That is because the env var DBT_ADD is a string literal 'False' and the result of 'False' == False is indeed False. So, to fix our macro to behave like we want it to, we could do something like:

-- macros/add_predicate.sql
{% macro add_predicate(arg) -%}
    {%- if arg == 'False' -%}
        -- where 1 = 2
    {%- else -%}
        -- where 1 = 1
    {%- endif -%}
{%- endmacro %}

And now, compiling the model again:

$ dbt compile -s foo
03:56:06  Running with dbt=1.5.3
03:56:07  Registered adapter: snowflake=1.5.2
03:56:08  Found 1 model, 0 tests, 0 snapshots, 0 analyses, 323 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics, 0 groups
03:56:08  
03:56:11  Concurrency: 1 threads (target='default')
03:56:11  
03:56:11  Compiled node 'foo' is:
select 1 id
-- where 1 = 2

Of course things can get pretty complicated very rapidly and hard to diagnose. Let's rewrite our macro going back to when we had the condition evaluate against False (instead of string literal 'False' in our fixed version). Additionally, we are adding extra business logic here.

Let's assume our business logic has changed somewhat - we have an audit table that we want to check against - and depending on the result in that table, set the correct predicate. I've created that table via create or replace table development.dbt_jyeo.audit as (select 'baz' as table_name);.

-- macros/add_predicate.sql
{% macro add_predicate(arg) -%}
    {%- if arg == False -%}
        -- where 1 = 2
    {%- else -%}
        {%- set results = run_query("select * from development.dbt_jyeo.audit where table_name = '" ~ this.identifier ~ "'") -%}
        {%- if execute -%}
            {%- if results.columns[0].values() | length > 0 -%}
                -- where 1 = 1
            {%- else -%}
                -- where 1 = 2
            {%- endif -%}
        {%- endif -%}
    {%- endif -%}
{%- endmacro %}

And then add a second bar model to accompany our foo model:

-- models/foo.sql
select 1 id
{{ add_predicate(arg = env_var("DBT_ADD")) }}

-- models/bar.sql
select 1 id
{{ add_predicate(arg = env_var("DBT_ADD")) }}

Let's compile our models:

$ export DBT_ADD=False && dbt compile -s foo && dbt compile -s bar
04:49:11  Running with dbt=1.5.3
04:49:11  Registered adapter: snowflake=1.5.2
04:49:12  Found 2 models, 0 tests, 0 snapshots, 0 analyses, 323 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics, 0 groups
04:49:12  
04:49:15  Concurrency: 1 threads (target='default')
04:49:15  
04:49:18  Compiled node 'foo' is:
select 1 id
-- where 1 = 2
04:49:22  Running with dbt=1.5.3
04:49:22  Registered adapter: snowflake=1.5.2
04:49:23  Found 2 models, 0 tests, 0 snapshots, 0 analyses, 323 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics, 0 groups
04:49:23  
04:49:26  Concurrency: 1 threads (target='default')
04:49:26  
04:49:28  Compiled node 'bar' is:
select 1 id
-- where 1 = 2

Here, it appears that things are working as expected - thus we think that it doesn't matter what is in our audit table, since we would never be checking it when DBT_ADD is set to False.

Now, let's add a new row to our audit table.

insert into development.dbt_jyeo.audit values ('bar')

Recall that previously we only have baz in our audit table when it was created.

As before, we had the notion that by setting DBT_ADD to False, the compilation of our foo and bar models should still be exactly as they were yes? However:

$ dbt compile -s foo && dbt compile -s bar
04:49:56  Running with dbt=1.5.3
04:49:56  Registered adapter: snowflake=1.5.2
04:49:57  Found 2 models, 0 tests, 0 snapshots, 0 analyses, 323 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics, 0 groups
04:49:57  
04:50:00  Concurrency: 1 threads (target='default')
04:50:00  
04:50:03  Compiled node 'foo' is:
select 1 id
-- where 1 = 2
04:50:06  Running with dbt=1.5.3
04:50:06  Registered adapter: snowflake=1.5.2
04:50:07  Found 2 models, 0 tests, 0 snapshots, 0 analyses, 323 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics, 0 groups
04:50:07  
04:50:10  Concurrency: 1 threads (target='default')
04:50:10  
04:50:13  Compiled node 'bar' is:
select 1 id
-- where 1 = 1

Here we can see that this is indeed not the case (and you should already know this from above) - instead we have ALWAYS been bypassing the first condition in our macro, it's just that since we have 2 instances of -- where 1 = 2 - it made it much tricker to fully know that we have always bypassed the first condition.

Ultimately, the best recommendation is probably to do a string literal:

{% macro add_predicate(arg) -%}
    {%- if arg == 'disabled' -%}
        -- where 1 = 2
    {%- else -%}
        {%- set results = run_query("select * from development.dbt_jyeo.audit where table_name = '" ~ this.identifier ~ "'") -%}
        {%- if execute -%}
            {%- if results.columns[0].values() | length > 0 -%}
                -- where 1 = 1
            {%- else -%}
                -- where 1 = 2
            {%- endif -%}
        {%- endif -%}
    {%- endif -%}
{%- endmacro %}

Then setting the env var DBT_ADD to disabled as well.

$ export DBT_ADD=disabled && dbt compile -s foo && dbt compile -s bar
04:51:02  Running with dbt=1.5.3
04:51:02  Registered adapter: snowflake=1.5.2
04:51:03  Found 2 models, 0 tests, 0 snapshots, 0 analyses, 323 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics, 0 groups
04:51:03  
04:51:06  Concurrency: 1 threads (target='default')
04:51:06  
04:51:06  Compiled node 'foo' is:
select 1 id
-- where 1 = 2
04:51:10  Running with dbt=1.5.3
04:51:10  Registered adapter: snowflake=1.5.2
04:51:11  Found 2 models, 0 tests, 0 snapshots, 0 analyses, 323 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics, 0 groups
04:51:11  
04:51:14  Concurrency: 1 threads (target='default')
04:51:14  
04:51:14  Compiled node 'bar' is:
select 1 id
-- where 1 = 2

By comparing string literals, we then avoid the ambiguity of how to actually set Truthy/Falsy/Boolean types in Jinja - you can read more on that confusion on the interwebs: https://stackoverflow.com/questions/8433450/why-doesnt-my-condition-logic-work-as-expected-in-jinja2-cherrypy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment