Skip to content

Instantly share code, notes, and snippets.

@jeremyyeo
Last active September 12, 2022 22:28
Show Gist options
  • Save jeremyyeo/38e77b15abd4a873cbde085fee39e347 to your computer and use it in GitHub Desktop.
Save jeremyyeo/38e77b15abd4a873cbde085fee39e347 to your computer and use it in GitHub Desktop.
Tidying up the SQL statements dbt generates #dbt

Tidying up the SQL statements dbt generates

This has only been tested on dbt 1.2 on Snowflake. Note that it is also not the best idea to override core macros like this as future dbt versions may cause breaking changes so caveat emptor.

If you have a dbt project setup like:

# dbt_project.yml
name: "my_dbt_project"
version: "1.0.0"
config-version: 2
profile: "snowflake"

models:
  my_dbt_project:
    +materialized: view
    +persist_docs:
      relation: true
      columns: true
# models/schema.yml

version: 2
models:
  - name: foo
    columns:
      - name: user_id
        description: Lorem ipsum
      - name: id
        description: Lorem ipsum
-- models/foo.sql

select 1 as user_id, 2 as id

When you do a dbt run... you may find that your debug logs have:

22:16:50.610945 [debug] [Thread-1  ]: On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.2.1", "profile_name": "snowflake", "target_name": "dev", "node_id": "model.my_dbt_project.foo"} */
create or replace  view development.dbt_jyeo.foo
  
    
    
(
  
    "USER_ID" COMMENT $$Lorem ipsum$$, 
  
    "ID" COMMENT $$Lorem ipsum$$
  
)

   as (
    select 1 as user_id, 2 as id
  );

You notice that dbt has generated SQL statements with uppercased column names that are quoted. This is because of the way the macros that generate those statements are written, mainly:

Because these are user space macros, you can override them. Put the following into your macros folder:

# macros/overrides.sql

{% macro get_column_comment_sql(column_name, column_dict) -%}
  {% if (column_name|upper in column_dict) -%}
    {% set matched_column = column_name|upper -%}
  {% elif (column_name|lower in column_dict) -%}
    {% set matched_column = column_name|lower -%}
  {% elif (column_name in column_dict) -%}
    {% set matched_column = column_name -%}
  {% else -%}
    {% set matched_column = None -%}
  {% endif -%}
  {% if matched_column -%}
    {{ column_name | lower }} comment $${{ column_dict[matched_column]['description'] | replace('$', '[$]') }}$$
  {%- else -%}
    {{ column_name | lower }} comment $$$$
  {%- endif -%}
{% endmacro %}

And then when you next dbt run - inspect your debug logs and you should see:

22:26:52.769138 [debug] [Thread-1  ]: On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.2.1", "profile_name": "snowflake", "target_name": "dev", "node_id": "model.my_dbt_project.foo"} */
create or replace  view development.dbt_jyeo.foo
  
    
    
(
  
    user_id comment $$Lorem ipsum$$, 
  
    id comment $$Lorem ipsum$$
  
)

   as (
    select 1 as user_id, 2 as id
  );

You should get a tidier generate SQL statement with lowecased and non-quoted column names. As above, you may not want to override this core dbt macro as future versions may cause breaking changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment