When usind dbt incremental models with Redshift, varchar expansion (e.g. varchar(5) > varchar(10)) will be silently handled and the current incremental model config on_schema_change
does not apply in to this particular scenario.
In the grand scheme of things, column ordering is not usually that big of a deal but the reasons why you may want to maintain column ordering is because your organization mandates a certain pattern - e.g. the
id
column always comes at the end or what have you. You may also have chosen to set the schema outside of dbt for your model via your owncreate table as ...
statements which we definitely do not recommend to do - let dbt manage your model and all it's DDL/DML is the best practice here.
Let's have a look at a simple example by setting up 2 models:
-- models/raw.sql
{{ config(materialized = 'table') }}
select 'bob' as name, 1 as id
-- models/inc.sql
{{ config(materialized = 'incremental', on_schema_change = 'fail') }}
select name, id from {{ ref('raw') }}
We are using our
raw
model to simulate a raw dataset and to see what happens when thename
column increases in size.
Let's build them:
$ dbt run --full-refresh
00:18:08 Running with dbt=1.3.0
00:18:09 Found 2 models, 0 tests, 0 snapshots, 0 analyses, 327 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
00:18:09
00:18:17 Concurrency: 1 threads (target='dev')
00:18:17
00:18:17 1 of 2 START sql table model public.raw ........................................ [RUN]
00:18:23 1 of 2 OK created sql table model public.raw ................................... [SELECT in 6.36s]
00:18:23 2 of 2 START sql incremental model public.inc .................................. [RUN]
00:18:29 2 of 2 OK created sql incremental model public.inc ............................. [SELECT in 5.84s]
00:18:31
00:18:31 Finished running 1 table model, 1 incremental model in 0 hours 0 minutes and 22.75 seconds (22.75s).
00:18:31
00:18:31 Completed successfully
00:18:31
00:18:31 Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2
And inspect our inc
table:
So far so good. Now let's modify the name
column in our raw
model to simulate what happens if raw data changes:
-- models/raw.sql
{{ config(materialized = 'table') }}
select 'alice' as name, 2 as id
Do a subsequent run (I'm using the --debug
option to get more details on the queries that dbt is sending to Redshift):
Expand logs...
$ dbt --debug run
============================== 2022-11-17 00:26:20.539398 | 904de28b-b85b-4f86-aa2c-62b91a214ffd ==============================
00:26:20.539406 [info ] [MainThread]: Running with dbt=1.3.0
00:26:20.539842 [debug] [MainThread]: running dbt with arguments {'debug': True, 'write_json': True, 'use_colors': True, 'printer_width': 80, 'version_check': True, 'partial_parse': False, 'static_parser': True, 'profiles_dir': '/Users/jeremy/.dbt', 'send_anonymous_usage_stats': True, 'event_buffer_size': 100000, 'quiet': False, 'no_print': False, 'which': 'run', 'rpc_method': 'run', 'indirect_selection': 'eager'}
00:26:20.540205 [debug] [MainThread]: Tracking: tracking
00:26:20.571764 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'start', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10dafa820>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10dafac10>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10db0d610>]}
00:26:20.595989 [debug] [MainThread]: Partial parsing not enabled
00:26:20.627926 [debug] [MainThread]: Parsing macros/timestamps.sql
00:26:20.631838 [debug] [MainThread]: Parsing macros/catalog.sql
00:26:20.648009 [debug] [MainThread]: Parsing macros/relations.sql
00:26:20.648899 [debug] [MainThread]: Parsing macros/adapters.sql
00:26:20.680501 [debug] [MainThread]: Parsing macros/materializations/snapshot_merge.sql
00:26:20.681635 [debug] [MainThread]: Parsing macros/utils/length.sql
00:26:20.682661 [debug] [MainThread]: Parsing macros/utils/dateadd.sql
00:26:20.683926 [debug] [MainThread]: Parsing macros/utils/listagg.sql
00:26:20.688099 [debug] [MainThread]: Parsing macros/utils/datediff.sql
00:26:20.689405 [debug] [MainThread]: Parsing macros/utils/cast_bool_to_text.sql
00:26:20.690429 [debug] [MainThread]: Parsing macros/utils/array_concat.sql
00:26:20.691554 [debug] [MainThread]: Parsing macros/utils/last_day.sql
00:26:20.693605 [debug] [MainThread]: Parsing macros/utils/split_part.sql
00:26:20.696057 [debug] [MainThread]: Parsing macros/utils/array_construct.sql
00:26:20.697293 [debug] [MainThread]: Parsing macros/utils/array_append.sql
00:26:20.698650 [debug] [MainThread]: Parsing macros/adapters/apply_grants.sql
00:26:20.699836 [debug] [MainThread]: Parsing macros/timestamps.sql
00:26:20.702467 [debug] [MainThread]: Parsing macros/catalog.sql
00:26:20.705349 [debug] [MainThread]: Parsing macros/relations.sql
00:26:20.707356 [debug] [MainThread]: Parsing macros/adapters.sql
00:26:20.737183 [debug] [MainThread]: Parsing macros/materializations/incremental_strategies.sql
00:26:20.738865 [debug] [MainThread]: Parsing macros/materializations/snapshot_merge.sql
00:26:20.741664 [debug] [MainThread]: Parsing macros/utils/dateadd.sql
00:26:20.742890 [debug] [MainThread]: Parsing macros/utils/listagg.sql
00:26:20.745529 [debug] [MainThread]: Parsing macros/utils/datediff.sql
00:26:20.754669 [debug] [MainThread]: Parsing macros/utils/any_value.sql
00:26:20.755658 [debug] [MainThread]: Parsing macros/utils/last_day.sql
00:26:20.757577 [debug] [MainThread]: Parsing macros/utils/split_part.sql
00:26:20.759949 [debug] [MainThread]: Parsing macros/materializations/hooks.sql
00:26:20.765705 [debug] [MainThread]: Parsing macros/materializations/configs.sql
00:26:20.768950 [debug] [MainThread]: Parsing macros/materializations/snapshots/snapshot_merge.sql
00:26:20.771133 [debug] [MainThread]: Parsing macros/materializations/snapshots/strategies.sql
00:26:20.793080 [debug] [MainThread]: Parsing macros/materializations/snapshots/helpers.sql
00:26:20.807777 [debug] [MainThread]: Parsing macros/materializations/snapshots/snapshot.sql
00:26:20.822393 [debug] [MainThread]: Parsing macros/materializations/tests/test.sql
00:26:20.829504 [debug] [MainThread]: Parsing macros/materializations/tests/helpers.sql
00:26:20.832365 [debug] [MainThread]: Parsing macros/materializations/tests/where_subquery.sql
00:26:20.835003 [debug] [MainThread]: Parsing macros/materializations/models/incremental/column_helpers.sql
00:26:20.846722 [debug] [MainThread]: Parsing macros/materializations/models/incremental/merge.sql
00:26:20.865953 [debug] [MainThread]: Parsing macros/materializations/models/incremental/is_incremental.sql
00:26:20.868576 [debug] [MainThread]: Parsing macros/materializations/models/incremental/strategies.sql
00:26:20.878361 [debug] [MainThread]: Parsing macros/materializations/models/incremental/incremental.sql
00:26:20.891092 [debug] [MainThread]: Parsing macros/materializations/models/incremental/on_schema_change.sql
00:26:20.911771 [debug] [MainThread]: Parsing macros/materializations/models/table/table.sql
00:26:20.919083 [debug] [MainThread]: Parsing macros/materializations/models/table/create_table_as.sql
00:26:20.923969 [debug] [MainThread]: Parsing macros/materializations/models/view/view.sql
00:26:20.931714 [debug] [MainThread]: Parsing macros/materializations/models/view/helpers.sql
00:26:20.933911 [debug] [MainThread]: Parsing macros/materializations/models/view/create_or_replace_view.sql
00:26:20.938585 [debug] [MainThread]: Parsing macros/materializations/models/view/create_view_as.sql
00:26:20.942010 [debug] [MainThread]: Parsing macros/materializations/seeds/seed.sql
00:26:20.951286 [debug] [MainThread]: Parsing macros/materializations/seeds/helpers.sql
00:26:20.973045 [debug] [MainThread]: Parsing macros/get_custom_name/get_custom_alias.sql
00:26:20.975625 [debug] [MainThread]: Parsing macros/get_custom_name/get_custom_schema.sql
00:26:20.980202 [debug] [MainThread]: Parsing macros/get_custom_name/get_custom_database.sql
00:26:20.982486 [debug] [MainThread]: Parsing macros/generic_test_sql/relationships.sql
00:26:20.983840 [debug] [MainThread]: Parsing macros/generic_test_sql/not_null.sql
00:26:20.985266 [debug] [MainThread]: Parsing macros/generic_test_sql/unique.sql
00:26:20.986547 [debug] [MainThread]: Parsing macros/generic_test_sql/accepted_values.sql
00:26:20.988635 [debug] [MainThread]: Parsing macros/etc/statement.sql
00:26:20.997059 [debug] [MainThread]: Parsing macros/etc/datetime.sql
00:26:21.008709 [debug] [MainThread]: Parsing macros/utils/except.sql
00:26:21.011078 [debug] [MainThread]: Parsing macros/utils/replace.sql
00:26:21.014530 [debug] [MainThread]: Parsing macros/utils/concat.sql
00:26:21.016112 [debug] [MainThread]: Parsing macros/utils/length.sql
00:26:21.017949 [debug] [MainThread]: Parsing macros/utils/dateadd.sql
00:26:21.020028 [debug] [MainThread]: Parsing macros/utils/intersect.sql
00:26:21.021627 [debug] [MainThread]: Parsing macros/utils/escape_single_quotes.sql
00:26:21.023559 [debug] [MainThread]: Parsing macros/utils/right.sql
00:26:21.027183 [debug] [MainThread]: Parsing macros/utils/listagg.sql
00:26:21.031354 [debug] [MainThread]: Parsing macros/utils/datediff.sql
00:26:21.033826 [debug] [MainThread]: Parsing macros/utils/safe_cast.sql
00:26:21.035889 [debug] [MainThread]: Parsing macros/utils/hash.sql
00:26:21.039178 [debug] [MainThread]: Parsing macros/utils/cast_bool_to_text.sql
00:26:21.048756 [debug] [MainThread]: Parsing macros/utils/any_value.sql
00:26:21.053378 [debug] [MainThread]: Parsing macros/utils/position.sql
00:26:21.060447 [debug] [MainThread]: Parsing macros/utils/literal.sql
00:26:21.064944 [debug] [MainThread]: Parsing macros/utils/data_types.sql
00:26:21.080025 [debug] [MainThread]: Parsing macros/utils/array_concat.sql
00:26:21.083907 [debug] [MainThread]: Parsing macros/utils/bool_or.sql
00:26:21.086630 [debug] [MainThread]: Parsing macros/utils/last_day.sql
00:26:21.089437 [debug] [MainThread]: Parsing macros/utils/split_part.sql
00:26:21.093952 [debug] [MainThread]: Parsing macros/utils/date_trunc.sql
00:26:21.096059 [debug] [MainThread]: Parsing macros/utils/array_construct.sql
00:26:21.098265 [debug] [MainThread]: Parsing macros/utils/array_append.sql
00:26:21.100135 [debug] [MainThread]: Parsing macros/adapters/schema.sql
00:26:21.103210 [debug] [MainThread]: Parsing macros/adapters/timestamps.sql
00:26:21.107899 [debug] [MainThread]: Parsing macros/adapters/indexes.sql
00:26:21.113081 [debug] [MainThread]: Parsing macros/adapters/relation.sql
00:26:21.131393 [debug] [MainThread]: Parsing macros/adapters/freshness.sql
00:26:21.134200 [debug] [MainThread]: Parsing macros/adapters/apply_grants.sql
00:26:21.152315 [debug] [MainThread]: Parsing macros/adapters/persist_docs.sql
00:26:21.158762 [debug] [MainThread]: Parsing macros/adapters/metadata.sql
00:26:21.168110 [debug] [MainThread]: Parsing macros/adapters/columns.sql
00:26:21.181427 [debug] [MainThread]: Parsing macros/python_model/python.sql
00:26:21.188725 [debug] [MainThread]: Parsing tests/generic/builtin.sql
00:26:21.513803 [debug] [MainThread]: 1699: static parser successfully parsed inc.sql
00:26:21.531030 [debug] [MainThread]: 1699: static parser successfully parsed raw.sql
00:26:21.596192 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'load_project', 'label': '904de28b-b85b-4f86-aa2c-62b91a214ffd', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10e161cd0>]}
00:26:21.605955 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'resource_counts', 'label': '904de28b-b85b-4f86-aa2c-62b91a214ffd', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10e110b80>]}
00:26:21.606474 [info ] [MainThread]: Found 2 models, 0 tests, 0 snapshots, 0 analyses, 327 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
00:26:21.606829 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label': '904de28b-b85b-4f86-aa2c-62b91a214ffd', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10e110a90>]}
00:26:21.608946 [info ] [MainThread]:
00:26:21.609840 [debug] [MainThread]: Acquiring new redshift connection "master"
00:26:21.611903 [debug] [ThreadPool]: Acquiring new redshift connection "list_dev"
00:26:21.628566 [debug] [ThreadPool]: Using redshift connection "list_dev"
00:26:21.628969 [debug] [ThreadPool]: On list_dev: /* {"app": "dbt", "dbt_version": "1.3.0", "profile_name": "redshift", "target_name": "dev", "connection_name": "list_dev"} */
select distinct nspname from pg_namespace
00:26:21.629351 [debug] [ThreadPool]: Opening a new connection, currently in state init
00:26:21.629976 [debug] [ThreadPool]: Redshift adapter: Connecting to Redshift using 'database' credentials
00:26:24.087130 [debug] [ThreadPool]: SQL status: SELECT in 2.46 seconds
00:26:24.092293 [debug] [ThreadPool]: On list_dev: Close
00:26:24.096612 [debug] [ThreadPool]: Acquiring new redshift connection "list_dev_public"
00:26:24.113464 [debug] [ThreadPool]: Using redshift connection "list_dev_public"
00:26:24.114038 [debug] [ThreadPool]: On list_dev_public: BEGIN
00:26:24.114492 [debug] [ThreadPool]: Opening a new connection, currently in state closed
00:26:24.115055 [debug] [ThreadPool]: Redshift adapter: Connecting to Redshift using 'database' credentials
00:26:26.501427 [debug] [ThreadPool]: SQL status: BEGIN in 2.39 seconds
00:26:26.502062 [debug] [ThreadPool]: Using redshift connection "list_dev_public"
00:26:26.502596 [debug] [ThreadPool]: On list_dev_public: /* {"app": "dbt", "dbt_version": "1.3.0", "profile_name": "redshift", "target_name": "dev", "connection_name": "list_dev_public"} */
select
'dev' as database,
tablename as name,
schemaname as schema,
'table' as type
from pg_tables
where schemaname ilike 'public'
union all
select
'dev' as database,
viewname as name,
schemaname as schema,
'view' as type
from pg_views
where schemaname ilike 'public'
00:26:26.805035 [debug] [ThreadPool]: SQL status: SELECT in 0.3 seconds
00:26:26.809048 [debug] [ThreadPool]: On list_dev_public: ROLLBACK
00:26:27.023089 [debug] [ThreadPool]: On list_dev_public: Close
00:26:27.036576 [debug] [MainThread]: Using redshift connection "master"
00:26:27.037506 [debug] [MainThread]: On master: BEGIN
00:26:27.038280 [debug] [MainThread]: Opening a new connection, currently in state init
00:26:27.039051 [debug] [MainThread]: Redshift adapter: Connecting to Redshift using 'database' credentials
00:26:29.624696 [debug] [MainThread]: SQL status: BEGIN in 2.59 seconds
00:26:29.625982 [debug] [MainThread]: Using redshift connection "master"
00:26:29.627202 [debug] [MainThread]: On master: /* {"app": "dbt", "dbt_version": "1.3.0", "profile_name": "redshift", "target_name": "dev", "connection_name": "master"} */
with relation as (
select
pg_rewrite.ev_class as class,
pg_rewrite.oid as id
from pg_rewrite
),
class as (
select
oid as id,
relname as name,
relnamespace as schema,
relkind as kind
from pg_class
),
dependency as (
select distinct
pg_depend.objid as id,
pg_depend.refobjid as ref
from pg_depend
),
schema as (
select
pg_namespace.oid as id,
pg_namespace.nspname as name
from pg_namespace
where nspname != 'information_schema' and nspname not like 'pg\_%'
),
referenced as (
select
relation.id AS id,
referenced_class.name ,
referenced_class.schema ,
referenced_class.kind
from relation
join class as referenced_class on relation.class=referenced_class.id
where referenced_class.kind in ('r', 'v')
),
relationships as (
select
referenced.name as referenced_name,
referenced.schema as referenced_schema_id,
dependent_class.name as dependent_name,
dependent_class.schema as dependent_schema_id,
referenced.kind as kind
from referenced
join dependency on referenced.id=dependency.id
join class as dependent_class on dependency.ref=dependent_class.id
where
(referenced.name != dependent_class.name or
referenced.schema != dependent_class.schema)
)
select
referenced_schema.name as referenced_schema,
relationships.referenced_name as referenced_name,
dependent_schema.name as dependent_schema,
relationships.dependent_name as dependent_name
from relationships
join schema as dependent_schema on relationships.dependent_schema_id=dependent_schema.id
join schema as referenced_schema on relationships.referenced_schema_id=referenced_schema.id
group by referenced_schema, referenced_name, dependent_schema, dependent_name
order by referenced_schema, referenced_name, dependent_schema, dependent_name;
00:26:29.878221 [debug] [MainThread]: SQL status: SELECT in 0.25 seconds
00:26:29.879669 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label': '904de28b-b85b-4f86-aa2c-62b91a214ffd', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10e10a160>]}
00:26:29.880166 [debug] [MainThread]: On master: ROLLBACK
00:26:30.096306 [debug] [MainThread]: Using redshift connection "master"
00:26:30.097528 [debug] [MainThread]: On master: BEGIN
00:26:30.562005 [debug] [MainThread]: SQL status: BEGIN in 0.46 seconds
00:26:30.562565 [debug] [MainThread]: On master: COMMIT
00:26:30.562913 [debug] [MainThread]: Using redshift connection "master"
00:26:30.563298 [debug] [MainThread]: On master: COMMIT
00:26:30.875533 [debug] [MainThread]: SQL status: COMMIT in 0.31 seconds
00:26:30.876454 [debug] [MainThread]: On master: Close
00:26:30.878656 [info ] [MainThread]: Concurrency: 1 threads (target='dev')
00:26:30.879584 [info ] [MainThread]:
00:26:30.886064 [debug] [Thread-1 ]: Began running node model.my_project.raw
00:26:30.886995 [info ] [Thread-1 ]: 1 of 2 START sql table model public.raw ........................................ [RUN]
00:26:30.888692 [debug] [Thread-1 ]: Acquiring new redshift connection "model.my_project.raw"
00:26:30.889634 [debug] [Thread-1 ]: Began compiling node model.my_project.raw
00:26:30.890467 [debug] [Thread-1 ]: Compiling model.my_project.raw
00:26:30.895999 [debug] [Thread-1 ]: Writing injected SQL for node "model.my_project.raw"
00:26:30.897058 [debug] [Thread-1 ]: finished collecting timing info
00:26:30.897580 [debug] [Thread-1 ]: Began executing node model.my_project.raw
00:26:30.962458 [debug] [Thread-1 ]: Writing runtime sql for node "model.my_project.raw"
00:26:30.963320 [debug] [Thread-1 ]: Using redshift connection "model.my_project.raw"
00:26:30.963695 [debug] [Thread-1 ]: On model.my_project.raw: BEGIN
00:26:30.964047 [debug] [Thread-1 ]: Opening a new connection, currently in state closed
00:26:30.964411 [debug] [Thread-1 ]: Redshift adapter: Connecting to Redshift using 'database' credentials
00:26:32.976972 [debug] [Thread-1 ]: SQL status: BEGIN in 2.01 seconds
00:26:32.978198 [debug] [Thread-1 ]: Using redshift connection "model.my_project.raw"
00:26:32.979456 [debug] [Thread-1 ]: On model.my_project.raw: /* {"app": "dbt", "dbt_version": "1.3.0", "profile_name": "redshift", "target_name": "dev", "node_id": "model.my_project.raw"} */
create table
"dev"."public"."raw__dbt_tmp"
as (
select 'alice' as name, 1 as id
);
00:26:33.279851 [debug] [Thread-1 ]: SQL status: SELECT in 0.3 seconds
00:26:33.297419 [debug] [Thread-1 ]: Using redshift connection "model.my_project.raw"
00:26:33.297963 [debug] [Thread-1 ]: On model.my_project.raw: /* {"app": "dbt", "dbt_version": "1.3.0", "profile_name": "redshift", "target_name": "dev", "node_id": "model.my_project.raw"} */
alter table "dev"."public"."raw" rename to "raw__dbt_backup"
00:26:33.511988 [debug] [Thread-1 ]: SQL status: ALTER TABLE in 0.21 seconds
00:26:33.520670 [debug] [Thread-1 ]: Using redshift connection "model.my_project.raw"
00:26:33.521339 [debug] [Thread-1 ]: On model.my_project.raw: /* {"app": "dbt", "dbt_version": "1.3.0", "profile_name": "redshift", "target_name": "dev", "node_id": "model.my_project.raw"} */
alter table "dev"."public"."raw__dbt_tmp" rename to "raw"
00:26:33.736326 [debug] [Thread-1 ]: SQL status: ALTER TABLE in 0.21 seconds
00:26:33.781937 [debug] [Thread-1 ]: On model.my_project.raw: COMMIT
00:26:33.782285 [debug] [Thread-1 ]: Using redshift connection "model.my_project.raw"
00:26:33.782648 [debug] [Thread-1 ]: On model.my_project.raw: COMMIT
00:26:34.783040 [debug] [Thread-1 ]: SQL status: COMMIT in 1.0 seconds
00:26:34.784965 [debug] [Thread-1 ]: Using redshift connection "model.my_project.raw"
00:26:34.785757 [debug] [Thread-1 ]: On model.my_project.raw: BEGIN
00:26:35.054507 [debug] [Thread-1 ]: SQL status: BEGIN in 0.27 seconds
00:26:35.068253 [debug] [Thread-1 ]: Using redshift connection "model.my_project.raw"
00:26:35.068974 [debug] [Thread-1 ]: On model.my_project.raw: /* {"app": "dbt", "dbt_version": "1.3.0", "profile_name": "redshift", "target_name": "dev", "node_id": "model.my_project.raw"} */
drop table if exists "dev"."public"."raw__dbt_backup" cascade
00:26:35.367397 [debug] [Thread-1 ]: SQL status: DROP TABLE in 0.3 seconds
00:26:35.369224 [debug] [Thread-1 ]: On model.my_project.raw: COMMIT
00:26:35.369763 [debug] [Thread-1 ]: Using redshift connection "model.my_project.raw"
00:26:35.370338 [debug] [Thread-1 ]: On model.my_project.raw: COMMIT
00:26:36.405331 [debug] [Thread-1 ]: SQL status: COMMIT in 1.03 seconds
00:26:36.406672 [debug] [Thread-1 ]: Using redshift connection "model.my_project.raw"
00:26:36.407717 [debug] [Thread-1 ]: On model.my_project.raw: BEGIN
00:26:36.620654 [debug] [Thread-1 ]: SQL status: BEGIN in 0.21 seconds
00:26:36.622881 [debug] [Thread-1 ]: finished collecting timing info
00:26:36.623774 [debug] [Thread-1 ]: On model.my_project.raw: ROLLBACK
00:26:36.839603 [debug] [Thread-1 ]: On model.my_project.raw: Close
00:26:36.842470 [debug] [Thread-1 ]: Sending event: {'category': 'dbt', 'action': 'run_model', 'label': '904de28b-b85b-4f86-aa2c-62b91a214ffd', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10e235640>]}
00:26:36.843979 [info ] [Thread-1 ]: 1 of 2 OK created sql table model public.raw ................................... [SELECT in 5.95s]
00:26:36.845229 [debug] [Thread-1 ]: Finished running node model.my_project.raw
00:26:36.847088 [debug] [Thread-1 ]: Began running node model.my_project.inc
00:26:36.848344 [info ] [Thread-1 ]: 2 of 2 START sql incremental model public.inc .................................. [RUN]
00:26:36.850312 [debug] [Thread-1 ]: Acquiring new redshift connection "model.my_project.inc"
00:26:36.851086 [debug] [Thread-1 ]: Began compiling node model.my_project.inc
00:26:36.851778 [debug] [Thread-1 ]: Compiling model.my_project.inc
00:26:36.858675 [debug] [Thread-1 ]: Writing injected SQL for node "model.my_project.inc"
00:26:36.860003 [debug] [Thread-1 ]: finished collecting timing info
00:26:36.861023 [debug] [Thread-1 ]: Began executing node model.my_project.inc
00:26:36.909616 [debug] [Thread-1 ]: Using redshift connection "model.my_project.inc"
00:26:36.909986 [debug] [Thread-1 ]: On model.my_project.inc: /* {"app": "dbt", "dbt_version": "1.3.0", "profile_name": "redshift", "target_name": "dev", "node_id": "model.my_project.inc"} */
create temporary table
"inc__dbt_tmp132636896757"
as (
select name, id from "dev"."public"."raw"
);
00:26:36.910329 [debug] [Thread-1 ]: Opening a new connection, currently in state closed
00:26:36.910703 [debug] [Thread-1 ]: Redshift adapter: Connecting to Redshift using 'database' credentials
00:26:41.828800 [debug] [Thread-1 ]: SQL status: SELECT in 4.92 seconds
00:26:41.849444 [debug] [Thread-1 ]: Using redshift connection "model.my_project.inc"
00:26:41.849912 [debug] [Thread-1 ]: On model.my_project.inc: BEGIN
00:26:42.062472 [debug] [Thread-1 ]: SQL status: BEGIN in 0.21 seconds
00:26:42.062855 [debug] [Thread-1 ]: Using redshift connection "model.my_project.inc"
00:26:42.063228 [debug] [Thread-1 ]: On model.my_project.inc: /* {"app": "dbt", "dbt_version": "1.3.0", "profile_name": "redshift", "target_name": "dev", "node_id": "model.my_project.inc"} */
with bound_views as (
select
ordinal_position,
table_schema,
column_name,
data_type,
character_maximum_length,
numeric_precision,
numeric_scale
from information_schema."columns"
where table_name = 'inc__dbt_tmp132636896757'
),
unbound_views as (
select
ordinal_position,
view_schema,
col_name,
case
when col_type ilike 'character varying%' then
'character varying'
when col_type ilike 'numeric%' then 'numeric'
else col_type
end as col_type,
case
when col_type like 'character%'
then nullif(REGEXP_SUBSTR(col_type, '[0-9]+'), '')::int
else null
end as character_maximum_length,
case
when col_type like 'numeric%'
then nullif(
SPLIT_PART(REGEXP_SUBSTR(col_type, '[0-9,]+'), ',', 1),
'')::int
else null
end as numeric_precision,
case
when col_type like 'numeric%'
then nullif(
SPLIT_PART(REGEXP_SUBSTR(col_type, '[0-9,]+'), ',', 2),
'')::int
else null
end as numeric_scale
from pg_get_late_binding_view_cols()
cols(view_schema name, view_name name, col_name name,
col_type varchar, ordinal_position int)
where view_name = 'inc__dbt_tmp132636896757'
),
external_views as (
select
columnnum,
schemaname,
columnname,
case
when external_type ilike 'character varying%' or external_type ilike 'varchar%'
then 'character varying'
when external_type ilike 'numeric%' then 'numeric'
else external_type
end as external_type,
case
when external_type like 'character%' or external_type like 'varchar%'
then nullif(
REGEXP_SUBSTR(external_type, '[0-9]+'),
'')::int
else null
end as character_maximum_length,
case
when external_type like 'numeric%'
then nullif(
SPLIT_PART(REGEXP_SUBSTR(external_type, '[0-9,]+'), ',', 1),
'')::int
else null
end as numeric_precision,
case
when external_type like 'numeric%'
then nullif(
SPLIT_PART(REGEXP_SUBSTR(external_type, '[0-9,]+'), ',', 2),
'')::int
else null
end as numeric_scale
from
pg_catalog.svv_external_columns
where
schemaname = 'None'
and tablename = 'inc__dbt_tmp132636896757'
),
unioned as (
select * from bound_views
union all
select * from unbound_views
union all
select * from external_views
)
select
column_name,
data_type,
character_maximum_length,
numeric_precision,
numeric_scale
from unioned
order by ordinal_position
00:26:42.413009 [debug] [Thread-1 ]: SQL status: SELECT in 0.35 seconds
00:26:42.429898 [debug] [Thread-1 ]: Using redshift connection "model.my_project.inc"
00:26:42.430458 [debug] [Thread-1 ]: On model.my_project.inc: /* {"app": "dbt", "dbt_version": "1.3.0", "profile_name": "redshift", "target_name": "dev", "node_id": "model.my_project.inc"} */
with bound_views as (
select
ordinal_position,
table_schema,
column_name,
data_type,
character_maximum_length,
numeric_precision,
numeric_scale
from information_schema."columns"
where table_name = 'inc'
),
unbound_views as (
select
ordinal_position,
view_schema,
col_name,
case
when col_type ilike 'character varying%' then
'character varying'
when col_type ilike 'numeric%' then 'numeric'
else col_type
end as col_type,
case
when col_type like 'character%'
then nullif(REGEXP_SUBSTR(col_type, '[0-9]+'), '')::int
else null
end as character_maximum_length,
case
when col_type like 'numeric%'
then nullif(
SPLIT_PART(REGEXP_SUBSTR(col_type, '[0-9,]+'), ',', 1),
'')::int
else null
end as numeric_precision,
case
when col_type like 'numeric%'
then nullif(
SPLIT_PART(REGEXP_SUBSTR(col_type, '[0-9,]+'), ',', 2),
'')::int
else null
end as numeric_scale
from pg_get_late_binding_view_cols()
cols(view_schema name, view_name name, col_name name,
col_type varchar, ordinal_position int)
where view_name = 'inc'
),
external_views as (
select
columnnum,
schemaname,
columnname,
case
when external_type ilike 'character varying%' or external_type ilike 'varchar%'
then 'character varying'
when external_type ilike 'numeric%' then 'numeric'
else external_type
end as external_type,
case
when external_type like 'character%' or external_type like 'varchar%'
then nullif(
REGEXP_SUBSTR(external_type, '[0-9]+'),
'')::int
else null
end as character_maximum_length,
case
when external_type like 'numeric%'
then nullif(
SPLIT_PART(REGEXP_SUBSTR(external_type, '[0-9,]+'), ',', 1),
'')::int
else null
end as numeric_precision,
case
when external_type like 'numeric%'
then nullif(
SPLIT_PART(REGEXP_SUBSTR(external_type, '[0-9,]+'), ',', 2),
'')::int
else null
end as numeric_scale
from
pg_catalog.svv_external_columns
where
schemaname = 'public'
and tablename = 'inc'
),
unioned as (
select * from bound_views
union all
select * from unbound_views
union all
select * from external_views
)
select
column_name,
data_type,
character_maximum_length,
numeric_precision,
numeric_scale
from unioned
where table_schema = 'public'
order by ordinal_position
00:26:42.770678 [debug] [Thread-1 ]: SQL status: SELECT in 0.34 seconds
00:26:42.775414 [debug] [Thread-1 ]: Changing col type from character varying(3) to character varying(5) in table _ReferenceKey(database='dev', schema='public', identifier='inc')
00:26:42.793359 [debug] [Thread-1 ]: Using redshift connection "model.my_project.inc"
00:26:42.793861 [debug] [Thread-1 ]: On model.my_project.inc: /* {"app": "dbt", "dbt_version": "1.3.0", "profile_name": "redshift", "target_name": "dev", "node_id": "model.my_project.inc"} */
alter table "dev"."public"."inc" add column "name__dbt_alter" character varying(5);
update "dev"."public"."inc" set "name__dbt_alter" = "name";
alter table "dev"."public"."inc" drop column "name" cascade;
alter table "dev"."public"."inc" rename column "name__dbt_alter" to "name"
00:26:46.648717 [debug] [Thread-1 ]: SQL status: ALTER TABLE in 3.85 seconds
00:26:46.668355 [debug] [Thread-1 ]: Using redshift connection "model.my_project.inc"
00:26:46.669062 [debug] [Thread-1 ]: On model.my_project.inc: /* {"app": "dbt", "dbt_version": "1.3.0", "profile_name": "redshift", "target_name": "dev", "node_id": "model.my_project.inc"} */
with bound_views as (
select
ordinal_position,
table_schema,
column_name,
data_type,
character_maximum_length,
numeric_precision,
numeric_scale
from information_schema."columns"
where table_name = 'inc'
),
unbound_views as (
select
ordinal_position,
view_schema,
col_name,
case
when col_type ilike 'character varying%' then
'character varying'
when col_type ilike 'numeric%' then 'numeric'
else col_type
end as col_type,
case
when col_type like 'character%'
then nullif(REGEXP_SUBSTR(col_type, '[0-9]+'), '')::int
else null
end as character_maximum_length,
case
when col_type like 'numeric%'
then nullif(
SPLIT_PART(REGEXP_SUBSTR(col_type, '[0-9,]+'), ',', 1),
'')::int
else null
end as numeric_precision,
case
when col_type like 'numeric%'
then nullif(
SPLIT_PART(REGEXP_SUBSTR(col_type, '[0-9,]+'), ',', 2),
'')::int
else null
end as numeric_scale
from pg_get_late_binding_view_cols()
cols(view_schema name, view_name name, col_name name,
col_type varchar, ordinal_position int)
where view_name = 'inc'
),
external_views as (
select
columnnum,
schemaname,
columnname,
case
when external_type ilike 'character varying%' or external_type ilike 'varchar%'
then 'character varying'
when external_type ilike 'numeric%' then 'numeric'
else external_type
end as external_type,
case
when external_type like 'character%' or external_type like 'varchar%'
then nullif(
REGEXP_SUBSTR(external_type, '[0-9]+'),
'')::int
else null
end as character_maximum_length,
case
when external_type like 'numeric%'
then nullif(
SPLIT_PART(REGEXP_SUBSTR(external_type, '[0-9,]+'), ',', 1),
'')::int
else null
end as numeric_precision,
case
when external_type like 'numeric%'
then nullif(
SPLIT_PART(REGEXP_SUBSTR(external_type, '[0-9,]+'), ',', 2),
'')::int
else null
end as numeric_scale
from
pg_catalog.svv_external_columns
where
schemaname = 'public'
and tablename = 'inc'
),
unioned as (
select * from bound_views
union all
select * from unbound_views
union all
select * from external_views
)
select
column_name,
data_type,
character_maximum_length,
numeric_precision,
numeric_scale
from unioned
where table_schema = 'public'
order by ordinal_position
00:26:47.004986 [debug] [Thread-1 ]: SQL status: SELECT in 0.33 seconds
00:26:47.035324 [debug] [Thread-1 ]: Writing runtime sql for node "model.my_project.inc"
00:26:47.036249 [debug] [Thread-1 ]: Using redshift connection "model.my_project.inc"
00:26:47.036650 [debug] [Thread-1 ]: On model.my_project.inc: /* {"app": "dbt", "dbt_version": "1.3.0", "profile_name": "redshift", "target_name": "dev", "node_id": "model.my_project.inc"} */
insert into "dev"."public"."inc" ("id", "name")
(
select "id", "name"
from "inc__dbt_tmp132636896757"
)
00:26:49.975173 [debug] [Thread-1 ]: SQL status: INSERT 0 1 in 2.94 seconds
00:26:49.981351 [debug] [Thread-1 ]: On model.my_project.inc: COMMIT
00:26:49.982150 [debug] [Thread-1 ]: Using redshift connection "model.my_project.inc"
00:26:49.982880 [debug] [Thread-1 ]: On model.my_project.inc: COMMIT
00:26:51.588664 [debug] [Thread-1 ]: SQL status: COMMIT in 1.61 seconds
00:26:51.589522 [debug] [Thread-1 ]: finished collecting timing info
00:26:51.589963 [debug] [Thread-1 ]: On model.my_project.inc: Close
00:26:51.591156 [debug] [Thread-1 ]: Sending event: {'category': 'dbt', 'action': 'run_model', 'label': '904de28b-b85b-4f86-aa2c-62b91a214ffd', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10e28ea00>]}
00:26:51.591834 [info ] [Thread-1 ]: 2 of 2 OK created sql incremental model public.inc ............................. [INSERT 0 1 in 14.74s]
00:26:51.592322 [debug] [Thread-1 ]: Finished running node model.my_project.inc
00:26:51.593827 [debug] [MainThread]: Acquiring new redshift connection "master"
00:26:51.594190 [debug] [MainThread]: Using redshift connection "master"
00:26:51.594550 [debug] [MainThread]: On master: BEGIN
00:26:51.595022 [debug] [MainThread]: Opening a new connection, currently in state closed
00:26:51.595509 [debug] [MainThread]: Redshift adapter: Connecting to Redshift using 'database' credentials
00:26:53.790565 [debug] [MainThread]: SQL status: BEGIN in 2.2 seconds
00:26:53.790984 [debug] [MainThread]: On master: COMMIT
00:26:53.791302 [debug] [MainThread]: Using redshift connection "master"
00:26:53.791660 [debug] [MainThread]: On master: COMMIT
00:26:54.005397 [debug] [MainThread]: SQL status: COMMIT in 0.21 seconds
00:26:54.006332 [debug] [MainThread]: On master: Close
00:26:54.008557 [info ] [MainThread]:
00:26:54.009560 [info ] [MainThread]: Finished running 1 table model, 1 incremental model in 0 hours 0 minutes and 32.40 seconds (32.40s).
00:26:54.010287 [debug] [MainThread]: Connection 'master' was properly closed.
00:26:54.010903 [debug] [MainThread]: Connection 'model.my_project.inc' was properly closed.
00:26:54.025459 [info ] [MainThread]:
00:26:54.026064 [info ] [MainThread]: Completed successfully
00:26:54.026641 [info ] [MainThread]:
00:26:54.027174 [info ] [MainThread]: Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2
00:26:54.027876 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10db35e50>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10e161f10>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10e11bc70>]}
00:26:54.028978 [debug] [MainThread]: Flushing usage events
And then inspect the inc
table again:
We see that name
and id
have swapped places. If we look at our debug logs above, we notice:
00:26:42.793861 [debug] [Thread-1 ]: On model.my_project.inc: /* {"app": "dbt", "dbt_version": "1.3.0", "profile_name": "redshift", "target_name": "dev", "node_id": "model.my_project.inc"} */
alter table "dev"."public"."inc" add column "name__dbt_alter" character varying(5);
update "dev"."public"."inc" set "name__dbt_alter" = "name";
alter table "dev"."public"."inc" drop column "name" cascade;
alter table "dev"."public"."inc" rename column "name__dbt_alter" to "name"
dbt has expanded our varchar(3) to varchar(5) in order to fit "alice"
in to the name
column. The sequence of adding a new column, dropping the old column and renaming the new column to the old name results in the name
column now appearing after the id
column.
There's a couple of things we can do here...
-- models/inc_vw.sql
{{ config(materialized = 'view', bind=False) }}
select name, id from {{ ref('inc') }}
This view will then always have the right ordering all the time. However you may not want an additional object in your database and we still run into errors if say the name
column was used as a sortkey or distkey1 - since the alter table "dev"."public"."inc" drop column "name" cascade;
statement will error if the column is a sort or dist key (to account for this, you will need to add a pre-hook to your model to remove the column as a key on the table.
dbt ships with a default alter_column_type
macro which is invoked when the above scenario happens so what we could try is to simply modify its functionality:
Lets add this macro to our macros/
folder:
{% macro alter_column_type(relation, column_name, new_column_type) -%}
{% if var('error_on_alter_column_type', False) %}
{% set msg -%}
Error in relation '{{ relation }}'. Column '{{ column_name }}' type changed to {{ new_column_type }}.
{%- endset %}
{% do exceptions.raise_compiler_error(msg) %}
{% else %}
{%- set tmp_column = column_name + "__dbt_alter" -%}
{% call statement('alter_column_type') %}
alter table {{ relation }} add column {{ adapter.quote(tmp_column) }} {{ new_column_type }};
update {{ relation }} set {{ adapter.quote(tmp_column) }} = {{ adapter.quote(column_name) }};
alter table {{ relation }} drop column {{ adapter.quote(column_name) }} cascade;
alter table {{ relation }} rename column {{ adapter.quote(tmp_column) }} to {{ adapter.quote(column_name) }}
{% endcall %}
{% endif %}
{% endmacro %}
I'm using a var to control the behaviour so let's add that to our dbt_project.yml
file:
# dbt_project.yml
...
vars:
error_on_alter_column_type: True
Modify our model by increasing the size again:
-- models/raw.sql
{{ config(materialized = 'table') }}
select 'aliceandbob' as name, 2 as id
And then do a run:
$ dbt run
00:54:24 Running with dbt=1.3.0
00:54:26 Found 2 models, 0 tests, 0 snapshots, 0 analyses, 328 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
00:54:26
00:54:35 Concurrency: 1 threads (target='dev')
00:54:35
00:54:35 1 of 2 START sql table model public.raw ........................................ [RUN]
00:54:42 1 of 2 OK created sql table model public.raw ................................... [SELECT in 6.61s]
00:54:42 2 of 2 START sql incremental model public.inc .................................. [RUN]
00:54:46 2 of 2 ERROR creating sql incremental model public.inc ......................... [ERROR in 3.94s]
00:54:48
00:54:48 Finished running 1 table model, 1 incremental model in 0 hours 0 minutes and 22.43 seconds (22.43s).
00:54:48
00:54:48 Completed with 1 error and 0 warnings:
00:54:48
00:54:48 Compilation Error in macro alter_column_type (macros/alter_column_type.sql)
00:54:48 Error in relation '"dev"."public"."inc"'. Column 'name' type changed to character varying(11).
00:54:48
00:54:48 > in macro alter_column_type (macros/alter_column_type.sql)
00:54:48 > called by macro materialization_incremental_default (macros/materializations/models/incremental/incremental.sql)
00:54:48 > called by macro alter_column_type (macros/alter_column_type.sql)
00:54:48
00:54:48 Done. PASS=1 WARN=0 ERROR=1 SKIP=0 TOTAL=2
In this manner - the inc
incremental model is never updated and an error returned if we hit the alter_column_type
macro. This behaviour is controllable by setting the var error_on_alter_column_type
in the dbt_project.yml
itself like we did above or via a runtime command2:
dbt run --vars 'error_on_alter_column_type: true'
dbt run --vars 'error_on_alter_column_type: false'