Skip to content

Instantly share code, notes, and snippets.

@jeremyyeo
Last active February 8, 2023 03:20
Show Gist options
  • Save jeremyyeo/c45daad12f071969604b0888b27f6488 to your computer and use it in GitHub Desktop.
Save jeremyyeo/c45daad12f071969604b0888b27f6488 to your computer and use it in GitHub Desktop.
Maintaining the order of columns with Redshift and dbt #dbt

Maintaining the order of columns with Redshift and dbt

When usind dbt incremental models with Redshift, varchar expansion (e.g. varchar(5) > varchar(10)) will be silently handled and the current incremental model config on_schema_change does not apply in to this particular scenario.

In the grand scheme of things, column ordering is not usually that big of a deal but the reasons why you may want to maintain column ordering is because your organization mandates a certain pattern - e.g. the id column always comes at the end or what have you. You may also have chosen to set the schema outside of dbt for your model via your own create table as ... statements which we definitely do not recommend to do - let dbt manage your model and all it's DDL/DML is the best practice here.

Let's have a look at a simple example by setting up 2 models:

-- models/raw.sql
{{ config(materialized = 'table') }}

select 'bob' as name, 1 as id

-- models/inc.sql
{{ config(materialized = 'incremental', on_schema_change = 'fail') }}

select name, id from {{ ref('raw') }}

We are using our raw model to simulate a raw dataset and to see what happens when the name column increases in size.

Let's build them:

$ dbt run --full-refresh
00:18:08  Running with dbt=1.3.0
00:18:09  Found 2 models, 0 tests, 0 snapshots, 0 analyses, 327 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
00:18:09  
00:18:17  Concurrency: 1 threads (target='dev')
00:18:17  
00:18:17  1 of 2 START sql table model public.raw ........................................ [RUN]
00:18:23  1 of 2 OK created sql table model public.raw ................................... [SELECT in 6.36s]
00:18:23  2 of 2 START sql incremental model public.inc .................................. [RUN]
00:18:29  2 of 2 OK created sql incremental model public.inc ............................. [SELECT in 5.84s]
00:18:31  
00:18:31  Finished running 1 table model, 1 incremental model in 0 hours 0 minutes and 22.75 seconds (22.75s).
00:18:31  
00:18:31  Completed successfully
00:18:31  
00:18:31  Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2

And inspect our inc table:

image

So far so good. Now let's modify the name column in our raw model to simulate what happens if raw data changes:

-- models/raw.sql
{{ config(materialized = 'table') }}

select 'alice' as name, 2 as id

Do a subsequent run (I'm using the --debug option to get more details on the queries that dbt is sending to Redshift):

Expand logs...
$ dbt --debug run
============================== 2022-11-17 00:26:20.539398 | 904de28b-b85b-4f86-aa2c-62b91a214ffd ==============================
00:26:20.539406 [info ] [MainThread]: Running with dbt=1.3.0
00:26:20.539842 [debug] [MainThread]: running dbt with arguments {'debug': True, 'write_json': True, 'use_colors': True, 'printer_width': 80, 'version_check': True, 'partial_parse': False, 'static_parser': True, 'profiles_dir': '/Users/jeremy/.dbt', 'send_anonymous_usage_stats': True, 'event_buffer_size': 100000, 'quiet': False, 'no_print': False, 'which': 'run', 'rpc_method': 'run', 'indirect_selection': 'eager'}
00:26:20.540205 [debug] [MainThread]: Tracking: tracking
00:26:20.571764 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'start', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10dafa820>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10dafac10>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10db0d610>]}
00:26:20.595989 [debug] [MainThread]: Partial parsing not enabled
00:26:20.627926 [debug] [MainThread]: Parsing macros/timestamps.sql
00:26:20.631838 [debug] [MainThread]: Parsing macros/catalog.sql
00:26:20.648009 [debug] [MainThread]: Parsing macros/relations.sql
00:26:20.648899 [debug] [MainThread]: Parsing macros/adapters.sql
00:26:20.680501 [debug] [MainThread]: Parsing macros/materializations/snapshot_merge.sql
00:26:20.681635 [debug] [MainThread]: Parsing macros/utils/length.sql
00:26:20.682661 [debug] [MainThread]: Parsing macros/utils/dateadd.sql
00:26:20.683926 [debug] [MainThread]: Parsing macros/utils/listagg.sql
00:26:20.688099 [debug] [MainThread]: Parsing macros/utils/datediff.sql
00:26:20.689405 [debug] [MainThread]: Parsing macros/utils/cast_bool_to_text.sql
00:26:20.690429 [debug] [MainThread]: Parsing macros/utils/array_concat.sql
00:26:20.691554 [debug] [MainThread]: Parsing macros/utils/last_day.sql
00:26:20.693605 [debug] [MainThread]: Parsing macros/utils/split_part.sql
00:26:20.696057 [debug] [MainThread]: Parsing macros/utils/array_construct.sql
00:26:20.697293 [debug] [MainThread]: Parsing macros/utils/array_append.sql
00:26:20.698650 [debug] [MainThread]: Parsing macros/adapters/apply_grants.sql
00:26:20.699836 [debug] [MainThread]: Parsing macros/timestamps.sql
00:26:20.702467 [debug] [MainThread]: Parsing macros/catalog.sql
00:26:20.705349 [debug] [MainThread]: Parsing macros/relations.sql
00:26:20.707356 [debug] [MainThread]: Parsing macros/adapters.sql
00:26:20.737183 [debug] [MainThread]: Parsing macros/materializations/incremental_strategies.sql
00:26:20.738865 [debug] [MainThread]: Parsing macros/materializations/snapshot_merge.sql
00:26:20.741664 [debug] [MainThread]: Parsing macros/utils/dateadd.sql
00:26:20.742890 [debug] [MainThread]: Parsing macros/utils/listagg.sql
00:26:20.745529 [debug] [MainThread]: Parsing macros/utils/datediff.sql
00:26:20.754669 [debug] [MainThread]: Parsing macros/utils/any_value.sql
00:26:20.755658 [debug] [MainThread]: Parsing macros/utils/last_day.sql
00:26:20.757577 [debug] [MainThread]: Parsing macros/utils/split_part.sql
00:26:20.759949 [debug] [MainThread]: Parsing macros/materializations/hooks.sql
00:26:20.765705 [debug] [MainThread]: Parsing macros/materializations/configs.sql
00:26:20.768950 [debug] [MainThread]: Parsing macros/materializations/snapshots/snapshot_merge.sql
00:26:20.771133 [debug] [MainThread]: Parsing macros/materializations/snapshots/strategies.sql
00:26:20.793080 [debug] [MainThread]: Parsing macros/materializations/snapshots/helpers.sql
00:26:20.807777 [debug] [MainThread]: Parsing macros/materializations/snapshots/snapshot.sql
00:26:20.822393 [debug] [MainThread]: Parsing macros/materializations/tests/test.sql
00:26:20.829504 [debug] [MainThread]: Parsing macros/materializations/tests/helpers.sql
00:26:20.832365 [debug] [MainThread]: Parsing macros/materializations/tests/where_subquery.sql
00:26:20.835003 [debug] [MainThread]: Parsing macros/materializations/models/incremental/column_helpers.sql
00:26:20.846722 [debug] [MainThread]: Parsing macros/materializations/models/incremental/merge.sql
00:26:20.865953 [debug] [MainThread]: Parsing macros/materializations/models/incremental/is_incremental.sql
00:26:20.868576 [debug] [MainThread]: Parsing macros/materializations/models/incremental/strategies.sql
00:26:20.878361 [debug] [MainThread]: Parsing macros/materializations/models/incremental/incremental.sql
00:26:20.891092 [debug] [MainThread]: Parsing macros/materializations/models/incremental/on_schema_change.sql
00:26:20.911771 [debug] [MainThread]: Parsing macros/materializations/models/table/table.sql
00:26:20.919083 [debug] [MainThread]: Parsing macros/materializations/models/table/create_table_as.sql
00:26:20.923969 [debug] [MainThread]: Parsing macros/materializations/models/view/view.sql
00:26:20.931714 [debug] [MainThread]: Parsing macros/materializations/models/view/helpers.sql
00:26:20.933911 [debug] [MainThread]: Parsing macros/materializations/models/view/create_or_replace_view.sql
00:26:20.938585 [debug] [MainThread]: Parsing macros/materializations/models/view/create_view_as.sql
00:26:20.942010 [debug] [MainThread]: Parsing macros/materializations/seeds/seed.sql
00:26:20.951286 [debug] [MainThread]: Parsing macros/materializations/seeds/helpers.sql
00:26:20.973045 [debug] [MainThread]: Parsing macros/get_custom_name/get_custom_alias.sql
00:26:20.975625 [debug] [MainThread]: Parsing macros/get_custom_name/get_custom_schema.sql
00:26:20.980202 [debug] [MainThread]: Parsing macros/get_custom_name/get_custom_database.sql
00:26:20.982486 [debug] [MainThread]: Parsing macros/generic_test_sql/relationships.sql
00:26:20.983840 [debug] [MainThread]: Parsing macros/generic_test_sql/not_null.sql
00:26:20.985266 [debug] [MainThread]: Parsing macros/generic_test_sql/unique.sql
00:26:20.986547 [debug] [MainThread]: Parsing macros/generic_test_sql/accepted_values.sql
00:26:20.988635 [debug] [MainThread]: Parsing macros/etc/statement.sql
00:26:20.997059 [debug] [MainThread]: Parsing macros/etc/datetime.sql
00:26:21.008709 [debug] [MainThread]: Parsing macros/utils/except.sql
00:26:21.011078 [debug] [MainThread]: Parsing macros/utils/replace.sql
00:26:21.014530 [debug] [MainThread]: Parsing macros/utils/concat.sql
00:26:21.016112 [debug] [MainThread]: Parsing macros/utils/length.sql
00:26:21.017949 [debug] [MainThread]: Parsing macros/utils/dateadd.sql
00:26:21.020028 [debug] [MainThread]: Parsing macros/utils/intersect.sql
00:26:21.021627 [debug] [MainThread]: Parsing macros/utils/escape_single_quotes.sql
00:26:21.023559 [debug] [MainThread]: Parsing macros/utils/right.sql
00:26:21.027183 [debug] [MainThread]: Parsing macros/utils/listagg.sql
00:26:21.031354 [debug] [MainThread]: Parsing macros/utils/datediff.sql
00:26:21.033826 [debug] [MainThread]: Parsing macros/utils/safe_cast.sql
00:26:21.035889 [debug] [MainThread]: Parsing macros/utils/hash.sql
00:26:21.039178 [debug] [MainThread]: Parsing macros/utils/cast_bool_to_text.sql
00:26:21.048756 [debug] [MainThread]: Parsing macros/utils/any_value.sql
00:26:21.053378 [debug] [MainThread]: Parsing macros/utils/position.sql
00:26:21.060447 [debug] [MainThread]: Parsing macros/utils/literal.sql
00:26:21.064944 [debug] [MainThread]: Parsing macros/utils/data_types.sql
00:26:21.080025 [debug] [MainThread]: Parsing macros/utils/array_concat.sql
00:26:21.083907 [debug] [MainThread]: Parsing macros/utils/bool_or.sql
00:26:21.086630 [debug] [MainThread]: Parsing macros/utils/last_day.sql
00:26:21.089437 [debug] [MainThread]: Parsing macros/utils/split_part.sql
00:26:21.093952 [debug] [MainThread]: Parsing macros/utils/date_trunc.sql
00:26:21.096059 [debug] [MainThread]: Parsing macros/utils/array_construct.sql
00:26:21.098265 [debug] [MainThread]: Parsing macros/utils/array_append.sql
00:26:21.100135 [debug] [MainThread]: Parsing macros/adapters/schema.sql
00:26:21.103210 [debug] [MainThread]: Parsing macros/adapters/timestamps.sql
00:26:21.107899 [debug] [MainThread]: Parsing macros/adapters/indexes.sql
00:26:21.113081 [debug] [MainThread]: Parsing macros/adapters/relation.sql
00:26:21.131393 [debug] [MainThread]: Parsing macros/adapters/freshness.sql
00:26:21.134200 [debug] [MainThread]: Parsing macros/adapters/apply_grants.sql
00:26:21.152315 [debug] [MainThread]: Parsing macros/adapters/persist_docs.sql
00:26:21.158762 [debug] [MainThread]: Parsing macros/adapters/metadata.sql
00:26:21.168110 [debug] [MainThread]: Parsing macros/adapters/columns.sql
00:26:21.181427 [debug] [MainThread]: Parsing macros/python_model/python.sql
00:26:21.188725 [debug] [MainThread]: Parsing tests/generic/builtin.sql
00:26:21.513803 [debug] [MainThread]: 1699: static parser successfully parsed inc.sql
00:26:21.531030 [debug] [MainThread]: 1699: static parser successfully parsed raw.sql
00:26:21.596192 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'load_project', 'label': '904de28b-b85b-4f86-aa2c-62b91a214ffd', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10e161cd0>]}
00:26:21.605955 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'resource_counts', 'label': '904de28b-b85b-4f86-aa2c-62b91a214ffd', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10e110b80>]}
00:26:21.606474 [info ] [MainThread]: Found 2 models, 0 tests, 0 snapshots, 0 analyses, 327 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
00:26:21.606829 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label': '904de28b-b85b-4f86-aa2c-62b91a214ffd', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10e110a90>]}
00:26:21.608946 [info ] [MainThread]: 
00:26:21.609840 [debug] [MainThread]: Acquiring new redshift connection "master"
00:26:21.611903 [debug] [ThreadPool]: Acquiring new redshift connection "list_dev"
00:26:21.628566 [debug] [ThreadPool]: Using redshift connection "list_dev"
00:26:21.628969 [debug] [ThreadPool]: On list_dev: /* {"app": "dbt", "dbt_version": "1.3.0", "profile_name": "redshift", "target_name": "dev", "connection_name": "list_dev"} */

    select distinct nspname from pg_namespace
  
00:26:21.629351 [debug] [ThreadPool]: Opening a new connection, currently in state init
00:26:21.629976 [debug] [ThreadPool]: Redshift adapter: Connecting to Redshift using 'database' credentials
00:26:24.087130 [debug] [ThreadPool]: SQL status: SELECT in 2.46 seconds
00:26:24.092293 [debug] [ThreadPool]: On list_dev: Close
00:26:24.096612 [debug] [ThreadPool]: Acquiring new redshift connection "list_dev_public"
00:26:24.113464 [debug] [ThreadPool]: Using redshift connection "list_dev_public"
00:26:24.114038 [debug] [ThreadPool]: On list_dev_public: BEGIN
00:26:24.114492 [debug] [ThreadPool]: Opening a new connection, currently in state closed
00:26:24.115055 [debug] [ThreadPool]: Redshift adapter: Connecting to Redshift using 'database' credentials
00:26:26.501427 [debug] [ThreadPool]: SQL status: BEGIN in 2.39 seconds
00:26:26.502062 [debug] [ThreadPool]: Using redshift connection "list_dev_public"
00:26:26.502596 [debug] [ThreadPool]: On list_dev_public: /* {"app": "dbt", "dbt_version": "1.3.0", "profile_name": "redshift", "target_name": "dev", "connection_name": "list_dev_public"} */
select
      'dev' as database,
      tablename as name,
      schemaname as schema,
      'table' as type
    from pg_tables
    where schemaname ilike 'public'
    union all
    select
      'dev' as database,
      viewname as name,
      schemaname as schema,
      'view' as type
    from pg_views
    where schemaname ilike 'public'
  
00:26:26.805035 [debug] [ThreadPool]: SQL status: SELECT in 0.3 seconds
00:26:26.809048 [debug] [ThreadPool]: On list_dev_public: ROLLBACK
00:26:27.023089 [debug] [ThreadPool]: On list_dev_public: Close
00:26:27.036576 [debug] [MainThread]: Using redshift connection "master"
00:26:27.037506 [debug] [MainThread]: On master: BEGIN
00:26:27.038280 [debug] [MainThread]: Opening a new connection, currently in state init
00:26:27.039051 [debug] [MainThread]: Redshift adapter: Connecting to Redshift using 'database' credentials
00:26:29.624696 [debug] [MainThread]: SQL status: BEGIN in 2.59 seconds
00:26:29.625982 [debug] [MainThread]: Using redshift connection "master"
00:26:29.627202 [debug] [MainThread]: On master: /* {"app": "dbt", "dbt_version": "1.3.0", "profile_name": "redshift", "target_name": "dev", "connection_name": "master"} */
with relation as (
        select
            pg_rewrite.ev_class as class,
            pg_rewrite.oid as id
        from pg_rewrite
    ),
    class as (
        select
            oid as id,
            relname as name,
            relnamespace as schema,
            relkind as kind
        from pg_class
    ),
    dependency as (
        select distinct
            pg_depend.objid as id,
            pg_depend.refobjid as ref
        from pg_depend
    ),
    schema as (
        select
            pg_namespace.oid as id,
            pg_namespace.nspname as name
        from pg_namespace
        where nspname != 'information_schema' and nspname not like 'pg\_%'
    ),
    referenced as (
        select
            relation.id AS id,
            referenced_class.name ,
            referenced_class.schema ,
            referenced_class.kind
        from relation
        join class as referenced_class on relation.class=referenced_class.id
        where referenced_class.kind in ('r', 'v')
    ),
    relationships as (
        select
            referenced.name as referenced_name,
            referenced.schema as referenced_schema_id,
            dependent_class.name as dependent_name,
            dependent_class.schema as dependent_schema_id,
            referenced.kind as kind
        from referenced
        join dependency on referenced.id=dependency.id
        join class as dependent_class on dependency.ref=dependent_class.id
        where
            (referenced.name != dependent_class.name or
             referenced.schema != dependent_class.schema)
    )

    select
        referenced_schema.name as referenced_schema,
        relationships.referenced_name as referenced_name,
        dependent_schema.name as dependent_schema,
        relationships.dependent_name as dependent_name
    from relationships
    join schema as dependent_schema on relationships.dependent_schema_id=dependent_schema.id
    join schema as referenced_schema on relationships.referenced_schema_id=referenced_schema.id
    group by referenced_schema, referenced_name, dependent_schema, dependent_name
    order by referenced_schema, referenced_name, dependent_schema, dependent_name;
00:26:29.878221 [debug] [MainThread]: SQL status: SELECT in 0.25 seconds
00:26:29.879669 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label': '904de28b-b85b-4f86-aa2c-62b91a214ffd', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10e10a160>]}
00:26:29.880166 [debug] [MainThread]: On master: ROLLBACK
00:26:30.096306 [debug] [MainThread]: Using redshift connection "master"
00:26:30.097528 [debug] [MainThread]: On master: BEGIN
00:26:30.562005 [debug] [MainThread]: SQL status: BEGIN in 0.46 seconds
00:26:30.562565 [debug] [MainThread]: On master: COMMIT
00:26:30.562913 [debug] [MainThread]: Using redshift connection "master"
00:26:30.563298 [debug] [MainThread]: On master: COMMIT
00:26:30.875533 [debug] [MainThread]: SQL status: COMMIT in 0.31 seconds
00:26:30.876454 [debug] [MainThread]: On master: Close
00:26:30.878656 [info ] [MainThread]: Concurrency: 1 threads (target='dev')
00:26:30.879584 [info ] [MainThread]: 
00:26:30.886064 [debug] [Thread-1  ]: Began running node model.my_project.raw
00:26:30.886995 [info ] [Thread-1  ]: 1 of 2 START sql table model public.raw ........................................ [RUN]
00:26:30.888692 [debug] [Thread-1  ]: Acquiring new redshift connection "model.my_project.raw"
00:26:30.889634 [debug] [Thread-1  ]: Began compiling node model.my_project.raw
00:26:30.890467 [debug] [Thread-1  ]: Compiling model.my_project.raw
00:26:30.895999 [debug] [Thread-1  ]: Writing injected SQL for node "model.my_project.raw"
00:26:30.897058 [debug] [Thread-1  ]: finished collecting timing info
00:26:30.897580 [debug] [Thread-1  ]: Began executing node model.my_project.raw
00:26:30.962458 [debug] [Thread-1  ]: Writing runtime sql for node "model.my_project.raw"
00:26:30.963320 [debug] [Thread-1  ]: Using redshift connection "model.my_project.raw"
00:26:30.963695 [debug] [Thread-1  ]: On model.my_project.raw: BEGIN
00:26:30.964047 [debug] [Thread-1  ]: Opening a new connection, currently in state closed
00:26:30.964411 [debug] [Thread-1  ]: Redshift adapter: Connecting to Redshift using 'database' credentials
00:26:32.976972 [debug] [Thread-1  ]: SQL status: BEGIN in 2.01 seconds
00:26:32.978198 [debug] [Thread-1  ]: Using redshift connection "model.my_project.raw"
00:26:32.979456 [debug] [Thread-1  ]: On model.my_project.raw: /* {"app": "dbt", "dbt_version": "1.3.0", "profile_name": "redshift", "target_name": "dev", "node_id": "model.my_project.raw"} */

  
    

  create  table
    "dev"."public"."raw__dbt_tmp"
    
    
    
  as (
    

select 'alice' as name, 1 as id
  );
  
00:26:33.279851 [debug] [Thread-1  ]: SQL status: SELECT in 0.3 seconds
00:26:33.297419 [debug] [Thread-1  ]: Using redshift connection "model.my_project.raw"
00:26:33.297963 [debug] [Thread-1  ]: On model.my_project.raw: /* {"app": "dbt", "dbt_version": "1.3.0", "profile_name": "redshift", "target_name": "dev", "node_id": "model.my_project.raw"} */
alter table "dev"."public"."raw" rename to "raw__dbt_backup"
00:26:33.511988 [debug] [Thread-1  ]: SQL status: ALTER TABLE in 0.21 seconds
00:26:33.520670 [debug] [Thread-1  ]: Using redshift connection "model.my_project.raw"
00:26:33.521339 [debug] [Thread-1  ]: On model.my_project.raw: /* {"app": "dbt", "dbt_version": "1.3.0", "profile_name": "redshift", "target_name": "dev", "node_id": "model.my_project.raw"} */
alter table "dev"."public"."raw__dbt_tmp" rename to "raw"
00:26:33.736326 [debug] [Thread-1  ]: SQL status: ALTER TABLE in 0.21 seconds
00:26:33.781937 [debug] [Thread-1  ]: On model.my_project.raw: COMMIT
00:26:33.782285 [debug] [Thread-1  ]: Using redshift connection "model.my_project.raw"
00:26:33.782648 [debug] [Thread-1  ]: On model.my_project.raw: COMMIT
00:26:34.783040 [debug] [Thread-1  ]: SQL status: COMMIT in 1.0 seconds
00:26:34.784965 [debug] [Thread-1  ]: Using redshift connection "model.my_project.raw"
00:26:34.785757 [debug] [Thread-1  ]: On model.my_project.raw: BEGIN
00:26:35.054507 [debug] [Thread-1  ]: SQL status: BEGIN in 0.27 seconds
00:26:35.068253 [debug] [Thread-1  ]: Using redshift connection "model.my_project.raw"
00:26:35.068974 [debug] [Thread-1  ]: On model.my_project.raw: /* {"app": "dbt", "dbt_version": "1.3.0", "profile_name": "redshift", "target_name": "dev", "node_id": "model.my_project.raw"} */
drop table if exists "dev"."public"."raw__dbt_backup" cascade
00:26:35.367397 [debug] [Thread-1  ]: SQL status: DROP TABLE in 0.3 seconds
00:26:35.369224 [debug] [Thread-1  ]: On model.my_project.raw: COMMIT
00:26:35.369763 [debug] [Thread-1  ]: Using redshift connection "model.my_project.raw"
00:26:35.370338 [debug] [Thread-1  ]: On model.my_project.raw: COMMIT
00:26:36.405331 [debug] [Thread-1  ]: SQL status: COMMIT in 1.03 seconds
00:26:36.406672 [debug] [Thread-1  ]: Using redshift connection "model.my_project.raw"
00:26:36.407717 [debug] [Thread-1  ]: On model.my_project.raw: BEGIN
00:26:36.620654 [debug] [Thread-1  ]: SQL status: BEGIN in 0.21 seconds
00:26:36.622881 [debug] [Thread-1  ]: finished collecting timing info
00:26:36.623774 [debug] [Thread-1  ]: On model.my_project.raw: ROLLBACK
00:26:36.839603 [debug] [Thread-1  ]: On model.my_project.raw: Close
00:26:36.842470 [debug] [Thread-1  ]: Sending event: {'category': 'dbt', 'action': 'run_model', 'label': '904de28b-b85b-4f86-aa2c-62b91a214ffd', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10e235640>]}
00:26:36.843979 [info ] [Thread-1  ]: 1 of 2 OK created sql table model public.raw ................................... [SELECT in 5.95s]
00:26:36.845229 [debug] [Thread-1  ]: Finished running node model.my_project.raw
00:26:36.847088 [debug] [Thread-1  ]: Began running node model.my_project.inc
00:26:36.848344 [info ] [Thread-1  ]: 2 of 2 START sql incremental model public.inc .................................. [RUN]
00:26:36.850312 [debug] [Thread-1  ]: Acquiring new redshift connection "model.my_project.inc"
00:26:36.851086 [debug] [Thread-1  ]: Began compiling node model.my_project.inc
00:26:36.851778 [debug] [Thread-1  ]: Compiling model.my_project.inc
00:26:36.858675 [debug] [Thread-1  ]: Writing injected SQL for node "model.my_project.inc"
00:26:36.860003 [debug] [Thread-1  ]: finished collecting timing info
00:26:36.861023 [debug] [Thread-1  ]: Began executing node model.my_project.inc
00:26:36.909616 [debug] [Thread-1  ]: Using redshift connection "model.my_project.inc"
00:26:36.909986 [debug] [Thread-1  ]: On model.my_project.inc: /* {"app": "dbt", "dbt_version": "1.3.0", "profile_name": "redshift", "target_name": "dev", "node_id": "model.my_project.inc"} */

    
  
    

  create temporary table
    "inc__dbt_tmp132636896757"
    
    
    
  as (
    

select name, id from "dev"."public"."raw"
  );
  
  
00:26:36.910329 [debug] [Thread-1  ]: Opening a new connection, currently in state closed
00:26:36.910703 [debug] [Thread-1  ]: Redshift adapter: Connecting to Redshift using 'database' credentials
00:26:41.828800 [debug] [Thread-1  ]: SQL status: SELECT in 4.92 seconds
00:26:41.849444 [debug] [Thread-1  ]: Using redshift connection "model.my_project.inc"
00:26:41.849912 [debug] [Thread-1  ]: On model.my_project.inc: BEGIN
00:26:42.062472 [debug] [Thread-1  ]: SQL status: BEGIN in 0.21 seconds
00:26:42.062855 [debug] [Thread-1  ]: Using redshift connection "model.my_project.inc"
00:26:42.063228 [debug] [Thread-1  ]: On model.my_project.inc: /* {"app": "dbt", "dbt_version": "1.3.0", "profile_name": "redshift", "target_name": "dev", "node_id": "model.my_project.inc"} */

      with bound_views as (
        select
          ordinal_position,
          table_schema,
          column_name,
          data_type,
          character_maximum_length,
          numeric_precision,
          numeric_scale

        from information_schema."columns"
        where table_name = 'inc__dbt_tmp132636896757'
    ),

    unbound_views as (
      select
        ordinal_position,
        view_schema,
        col_name,
        case
          when col_type ilike 'character varying%' then
            'character varying'
          when col_type ilike 'numeric%' then 'numeric'
          else col_type
        end as col_type,
        case
          when col_type like 'character%'
          then nullif(REGEXP_SUBSTR(col_type, '[0-9]+'), '')::int
          else null
        end as character_maximum_length,
        case
          when col_type like 'numeric%'
          then nullif(
            SPLIT_PART(REGEXP_SUBSTR(col_type, '[0-9,]+'), ',', 1),
            '')::int
          else null
        end as numeric_precision,
        case
          when col_type like 'numeric%'
          then nullif(
            SPLIT_PART(REGEXP_SUBSTR(col_type, '[0-9,]+'), ',', 2),
            '')::int
          else null
        end as numeric_scale

      from pg_get_late_binding_view_cols()
      cols(view_schema name, view_name name, col_name name,
           col_type varchar, ordinal_position int)
      where view_name = 'inc__dbt_tmp132636896757'
    ),

    external_views as (
      select
        columnnum,
        schemaname,
        columnname,
        case
          when external_type ilike 'character varying%' or external_type ilike 'varchar%'
          then 'character varying'
          when external_type ilike 'numeric%' then 'numeric'
          else external_type
        end as external_type,
        case
          when external_type like 'character%' or external_type like 'varchar%'
          then nullif(
            REGEXP_SUBSTR(external_type, '[0-9]+'),
            '')::int
          else null
        end as character_maximum_length,
        case
          when external_type like 'numeric%'
          then nullif(
            SPLIT_PART(REGEXP_SUBSTR(external_type, '[0-9,]+'), ',', 1),
            '')::int
          else null
        end as numeric_precision,
        case
          when external_type like 'numeric%'
          then nullif(
            SPLIT_PART(REGEXP_SUBSTR(external_type, '[0-9,]+'), ',', 2),
            '')::int
          else null
        end as numeric_scale
      from
        pg_catalog.svv_external_columns
      where
        schemaname = 'None'
        and tablename = 'inc__dbt_tmp132636896757'

    ),

    unioned as (
      select * from bound_views
      union all
      select * from unbound_views
      union all
      select * from external_views
    )

    select
      column_name,
      data_type,
      character_maximum_length,
      numeric_precision,
      numeric_scale

    from unioned
    
    order by ordinal_position
  
00:26:42.413009 [debug] [Thread-1  ]: SQL status: SELECT in 0.35 seconds
00:26:42.429898 [debug] [Thread-1  ]: Using redshift connection "model.my_project.inc"
00:26:42.430458 [debug] [Thread-1  ]: On model.my_project.inc: /* {"app": "dbt", "dbt_version": "1.3.0", "profile_name": "redshift", "target_name": "dev", "node_id": "model.my_project.inc"} */

      with bound_views as (
        select
          ordinal_position,
          table_schema,
          column_name,
          data_type,
          character_maximum_length,
          numeric_precision,
          numeric_scale

        from information_schema."columns"
        where table_name = 'inc'
    ),

    unbound_views as (
      select
        ordinal_position,
        view_schema,
        col_name,
        case
          when col_type ilike 'character varying%' then
            'character varying'
          when col_type ilike 'numeric%' then 'numeric'
          else col_type
        end as col_type,
        case
          when col_type like 'character%'
          then nullif(REGEXP_SUBSTR(col_type, '[0-9]+'), '')::int
          else null
        end as character_maximum_length,
        case
          when col_type like 'numeric%'
          then nullif(
            SPLIT_PART(REGEXP_SUBSTR(col_type, '[0-9,]+'), ',', 1),
            '')::int
          else null
        end as numeric_precision,
        case
          when col_type like 'numeric%'
          then nullif(
            SPLIT_PART(REGEXP_SUBSTR(col_type, '[0-9,]+'), ',', 2),
            '')::int
          else null
        end as numeric_scale

      from pg_get_late_binding_view_cols()
      cols(view_schema name, view_name name, col_name name,
           col_type varchar, ordinal_position int)
      where view_name = 'inc'
    ),

    external_views as (
      select
        columnnum,
        schemaname,
        columnname,
        case
          when external_type ilike 'character varying%' or external_type ilike 'varchar%'
          then 'character varying'
          when external_type ilike 'numeric%' then 'numeric'
          else external_type
        end as external_type,
        case
          when external_type like 'character%' or external_type like 'varchar%'
          then nullif(
            REGEXP_SUBSTR(external_type, '[0-9]+'),
            '')::int
          else null
        end as character_maximum_length,
        case
          when external_type like 'numeric%'
          then nullif(
            SPLIT_PART(REGEXP_SUBSTR(external_type, '[0-9,]+'), ',', 1),
            '')::int
          else null
        end as numeric_precision,
        case
          when external_type like 'numeric%'
          then nullif(
            SPLIT_PART(REGEXP_SUBSTR(external_type, '[0-9,]+'), ',', 2),
            '')::int
          else null
        end as numeric_scale
      from
        pg_catalog.svv_external_columns
      where
        schemaname = 'public'
        and tablename = 'inc'

    ),

    unioned as (
      select * from bound_views
      union all
      select * from unbound_views
      union all
      select * from external_views
    )

    select
      column_name,
      data_type,
      character_maximum_length,
      numeric_precision,
      numeric_scale

    from unioned
    
    where table_schema = 'public'
    
    order by ordinal_position
  
00:26:42.770678 [debug] [Thread-1  ]: SQL status: SELECT in 0.34 seconds
00:26:42.775414 [debug] [Thread-1  ]: Changing col type from character varying(3) to character varying(5) in table _ReferenceKey(database='dev', schema='public', identifier='inc')
00:26:42.793359 [debug] [Thread-1  ]: Using redshift connection "model.my_project.inc"
00:26:42.793861 [debug] [Thread-1  ]: On model.my_project.inc: /* {"app": "dbt", "dbt_version": "1.3.0", "profile_name": "redshift", "target_name": "dev", "node_id": "model.my_project.inc"} */

    alter table "dev"."public"."inc" add column "name__dbt_alter" character varying(5);
    update "dev"."public"."inc" set "name__dbt_alter" = "name";
    alter table "dev"."public"."inc" drop column "name" cascade;
    alter table "dev"."public"."inc" rename column "name__dbt_alter" to "name"
  
00:26:46.648717 [debug] [Thread-1  ]: SQL status: ALTER TABLE in 3.85 seconds
00:26:46.668355 [debug] [Thread-1  ]: Using redshift connection "model.my_project.inc"
00:26:46.669062 [debug] [Thread-1  ]: On model.my_project.inc: /* {"app": "dbt", "dbt_version": "1.3.0", "profile_name": "redshift", "target_name": "dev", "node_id": "model.my_project.inc"} */

      with bound_views as (
        select
          ordinal_position,
          table_schema,
          column_name,
          data_type,
          character_maximum_length,
          numeric_precision,
          numeric_scale

        from information_schema."columns"
        where table_name = 'inc'
    ),

    unbound_views as (
      select
        ordinal_position,
        view_schema,
        col_name,
        case
          when col_type ilike 'character varying%' then
            'character varying'
          when col_type ilike 'numeric%' then 'numeric'
          else col_type
        end as col_type,
        case
          when col_type like 'character%'
          then nullif(REGEXP_SUBSTR(col_type, '[0-9]+'), '')::int
          else null
        end as character_maximum_length,
        case
          when col_type like 'numeric%'
          then nullif(
            SPLIT_PART(REGEXP_SUBSTR(col_type, '[0-9,]+'), ',', 1),
            '')::int
          else null
        end as numeric_precision,
        case
          when col_type like 'numeric%'
          then nullif(
            SPLIT_PART(REGEXP_SUBSTR(col_type, '[0-9,]+'), ',', 2),
            '')::int
          else null
        end as numeric_scale

      from pg_get_late_binding_view_cols()
      cols(view_schema name, view_name name, col_name name,
           col_type varchar, ordinal_position int)
      where view_name = 'inc'
    ),

    external_views as (
      select
        columnnum,
        schemaname,
        columnname,
        case
          when external_type ilike 'character varying%' or external_type ilike 'varchar%'
          then 'character varying'
          when external_type ilike 'numeric%' then 'numeric'
          else external_type
        end as external_type,
        case
          when external_type like 'character%' or external_type like 'varchar%'
          then nullif(
            REGEXP_SUBSTR(external_type, '[0-9]+'),
            '')::int
          else null
        end as character_maximum_length,
        case
          when external_type like 'numeric%'
          then nullif(
            SPLIT_PART(REGEXP_SUBSTR(external_type, '[0-9,]+'), ',', 1),
            '')::int
          else null
        end as numeric_precision,
        case
          when external_type like 'numeric%'
          then nullif(
            SPLIT_PART(REGEXP_SUBSTR(external_type, '[0-9,]+'), ',', 2),
            '')::int
          else null
        end as numeric_scale
      from
        pg_catalog.svv_external_columns
      where
        schemaname = 'public'
        and tablename = 'inc'

    ),

    unioned as (
      select * from bound_views
      union all
      select * from unbound_views
      union all
      select * from external_views
    )

    select
      column_name,
      data_type,
      character_maximum_length,
      numeric_precision,
      numeric_scale

    from unioned
    
    where table_schema = 'public'
    
    order by ordinal_position
  
00:26:47.004986 [debug] [Thread-1  ]: SQL status: SELECT in 0.33 seconds
00:26:47.035324 [debug] [Thread-1  ]: Writing runtime sql for node "model.my_project.inc"
00:26:47.036249 [debug] [Thread-1  ]: Using redshift connection "model.my_project.inc"
00:26:47.036650 [debug] [Thread-1  ]: On model.my_project.inc: /* {"app": "dbt", "dbt_version": "1.3.0", "profile_name": "redshift", "target_name": "dev", "node_id": "model.my_project.inc"} */

      insert into "dev"."public"."inc" ("id", "name")
    (
        select "id", "name"
        from "inc__dbt_tmp132636896757"
    )


  
00:26:49.975173 [debug] [Thread-1  ]: SQL status: INSERT 0 1 in 2.94 seconds
00:26:49.981351 [debug] [Thread-1  ]: On model.my_project.inc: COMMIT
00:26:49.982150 [debug] [Thread-1  ]: Using redshift connection "model.my_project.inc"
00:26:49.982880 [debug] [Thread-1  ]: On model.my_project.inc: COMMIT
00:26:51.588664 [debug] [Thread-1  ]: SQL status: COMMIT in 1.61 seconds
00:26:51.589522 [debug] [Thread-1  ]: finished collecting timing info
00:26:51.589963 [debug] [Thread-1  ]: On model.my_project.inc: Close
00:26:51.591156 [debug] [Thread-1  ]: Sending event: {'category': 'dbt', 'action': 'run_model', 'label': '904de28b-b85b-4f86-aa2c-62b91a214ffd', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10e28ea00>]}
00:26:51.591834 [info ] [Thread-1  ]: 2 of 2 OK created sql incremental model public.inc ............................. [INSERT 0 1 in 14.74s]
00:26:51.592322 [debug] [Thread-1  ]: Finished running node model.my_project.inc
00:26:51.593827 [debug] [MainThread]: Acquiring new redshift connection "master"
00:26:51.594190 [debug] [MainThread]: Using redshift connection "master"
00:26:51.594550 [debug] [MainThread]: On master: BEGIN
00:26:51.595022 [debug] [MainThread]: Opening a new connection, currently in state closed
00:26:51.595509 [debug] [MainThread]: Redshift adapter: Connecting to Redshift using 'database' credentials
00:26:53.790565 [debug] [MainThread]: SQL status: BEGIN in 2.2 seconds
00:26:53.790984 [debug] [MainThread]: On master: COMMIT
00:26:53.791302 [debug] [MainThread]: Using redshift connection "master"
00:26:53.791660 [debug] [MainThread]: On master: COMMIT
00:26:54.005397 [debug] [MainThread]: SQL status: COMMIT in 0.21 seconds
00:26:54.006332 [debug] [MainThread]: On master: Close
00:26:54.008557 [info ] [MainThread]: 
00:26:54.009560 [info ] [MainThread]: Finished running 1 table model, 1 incremental model in 0 hours 0 minutes and 32.40 seconds (32.40s).
00:26:54.010287 [debug] [MainThread]: Connection 'master' was properly closed.
00:26:54.010903 [debug] [MainThread]: Connection 'model.my_project.inc' was properly closed.
00:26:54.025459 [info ] [MainThread]: 
00:26:54.026064 [info ] [MainThread]: Completed successfully
00:26:54.026641 [info ] [MainThread]: 
00:26:54.027174 [info ] [MainThread]: Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2
00:26:54.027876 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10db35e50>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10e161f10>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10e11bc70>]}
00:26:54.028978 [debug] [MainThread]: Flushing usage events

And then inspect the inc table again:

image

We see that name and id have swapped places. If we look at our debug logs above, we notice:

00:26:42.793861 [debug] [Thread-1  ]: On model.my_project.inc: /* {"app": "dbt", "dbt_version": "1.3.0", "profile_name": "redshift", "target_name": "dev", "node_id": "model.my_project.inc"} */

    alter table "dev"."public"."inc" add column "name__dbt_alter" character varying(5);
    update "dev"."public"."inc" set "name__dbt_alter" = "name";
    alter table "dev"."public"."inc" drop column "name" cascade;
    alter table "dev"."public"."inc" rename column "name__dbt_alter" to "name"

dbt has expanded our varchar(3) to varchar(5) in order to fit "alice" in to the name column. The sequence of adding a new column, dropping the old column and renaming the new column to the old name results in the name column now appearing after the id column.

There's a couple of things we can do here...

(A) Add a view (or another table) on top of the inc model.

-- models/inc_vw.sql
{{ config(materialized = 'view', bind=False) }}

select name, id from {{ ref('inc') }}

This view will then always have the right ordering all the time. However you may not want an additional object in your database and we still run into errors if say the name column was used as a sortkey or distkey1 - since the alter table "dev"."public"."inc" drop column "name" cascade; statement will error if the column is a sort or dist key (to account for this, you will need to add a pre-hook to your model to remove the column as a key on the table.

(B) Be strict about your incoming data and raise an exception when dbt does the subsequent run.

dbt ships with a default alter_column_type macro which is invoked when the above scenario happens so what we could try is to simply modify its functionality:

Lets add this macro to our macros/ folder:

{% macro alter_column_type(relation, column_name, new_column_type) -%}
    {% if var('error_on_alter_column_type', False) %}
        {% set msg -%}
            Error in relation '{{ relation }}'. Column '{{ column_name }}' type changed to {{ new_column_type }}.
        {%- endset %}
        {% do exceptions.raise_compiler_error(msg) %}
    {% else %}
        {%- set tmp_column = column_name + "__dbt_alter" -%}
        {% call statement('alter_column_type') %}
          alter table {{ relation }} add column {{ adapter.quote(tmp_column) }} {{ new_column_type }};
          update {{ relation }} set {{ adapter.quote(tmp_column) }} = {{ adapter.quote(column_name) }};
          alter table {{ relation }} drop column {{ adapter.quote(column_name) }} cascade;
          alter table {{ relation }} rename column {{ adapter.quote(tmp_column) }} to {{ adapter.quote(column_name) }}
        {% endcall %}
    {% endif %}
{% endmacro %}

I'm using a var to control the behaviour so let's add that to our dbt_project.yml file:

# dbt_project.yml
...
vars:
  error_on_alter_column_type: True

Modify our model by increasing the size again:

-- models/raw.sql
{{ config(materialized = 'table') }}

select 'aliceandbob' as name, 2 as id

And then do a run:

$ dbt run
00:54:24  Running with dbt=1.3.0
00:54:26  Found 2 models, 0 tests, 0 snapshots, 0 analyses, 328 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
00:54:26  
00:54:35  Concurrency: 1 threads (target='dev')
00:54:35  
00:54:35  1 of 2 START sql table model public.raw ........................................ [RUN]
00:54:42  1 of 2 OK created sql table model public.raw ................................... [SELECT in 6.61s]
00:54:42  2 of 2 START sql incremental model public.inc .................................. [RUN]
00:54:46  2 of 2 ERROR creating sql incremental model public.inc ......................... [ERROR in 3.94s]
00:54:48  
00:54:48  Finished running 1 table model, 1 incremental model in 0 hours 0 minutes and 22.43 seconds (22.43s).
00:54:48  
00:54:48  Completed with 1 error and 0 warnings:
00:54:48  
00:54:48  Compilation Error in macro alter_column_type (macros/alter_column_type.sql)
00:54:48    Error in relation '"dev"."public"."inc"'. Column 'name' type changed to character varying(11).
00:54:48    
00:54:48    > in macro alter_column_type (macros/alter_column_type.sql)
00:54:48    > called by macro materialization_incremental_default (macros/materializations/models/incremental/incremental.sql)
00:54:48    > called by macro alter_column_type (macros/alter_column_type.sql)
00:54:48  
00:54:48  Done. PASS=1 WARN=0 ERROR=1 SKIP=0 TOTAL=2

In this manner - the inc incremental model is never updated and an error returned if we hit the alter_column_type macro. This behaviour is controllable by setting the var error_on_alter_column_type in the dbt_project.yml itself like we did above or via a runtime command2:

dbt run --vars 'error_on_alter_column_type: true'
dbt run --vars 'error_on_alter_column_type: false'

Footnotes

  1. https://docs.getdbt.com/reference/resource-configs/redshift-configs#using-sortkey-and-distkey

  2. https://docs.getdbt.com/docs/build/project-variables

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment