Skip to content

Instantly share code, notes, and snippets.

@swhite-dbt
swhite-dbt / README.md
Last active September 30, 2025 14:22
Debugging dbt incremental models

Debugging dbt incremental models

Note: dbt's responsiblity is to generate the same DDL/DML everytime for the same dbt sql/jinja. dbt is not responsible for making sure your data is unique, it is not responsible for the shape of your data, etc - you yourself are responsible for that.

At a high level, what we're trying to do here is to:

  1. At the start of the run, make backups of the relevant resources and the data at the time of the run
  2. If the model does not run successfully, indicating the ALTER TABLE statement has occurred, we leave the backups we created in (1) so that we can come back the next day to review the data before and after the run. Here we can also create a backup copy of other related resources at the end of the run so that we have the exact copy of these tables at the time of the run, in case they've changed since we've had a chance to review the backup tables
  3. If the model runs successfully, everything is good and we can drop the backups we created in (1)

To do this, the approach

@swhite-dbt
swhite-dbt / README.md
Last active September 30, 2025 14:22
Debugging dbt tests

Debugging dbt tests

Note: dbt's responsiblity is to generate the same DDL/DML everytime for the same dbt sql/jinja. dbt is not responsible for making sure your data is unique, it is not responsible for the shape of your data, etc - you yourself are responsible for that.

At a high level, what we're trying to do here is to:

  1. At the start of the run, make backups of the relevant resources and the data at the time of the run
  2. Test whether the specified test passes or fails
  3. If it passes, everything is good - so we can drop the backups we created in (1)
  4. If it fails, we leave the backups we created in (1) so that we can come back the next day to review what rows caused the test to fail. Here we can also create a backup copy of other related resources at the end of the run so that we have the exact copy of these tables at the time of the run, in case they've changed since we've had a chance to review the backup tables