dbt and Sinter have the ability to run regular Redshift maintenance jobs. It's great to set these up early on in a project so that things stay clean as the project grows, and implementing these jobs in Sinter allows the same easy transparency and notifications as with your other dbt jobs.
This document will go through the specific steps necessary to configure vacuum and analyze jobs in the current version of dbt and Sinter. In the future, there will likely be a more idiomatically consistent way to express this logic using native dbt operations. Currently, this does work even if it is not elegant.
macros/redshift_maintenance.sql
These macros will get called in a special operation model and are responsible for actually writing the SQL for the maintenance commands. While it is possible to write much more specific vacuum
and analyze
commands, often it is enough to simply execute the their default behaviors.
{% macro vacuum(run) %}
{% if run == true %}
vacuum
{% else %}
select 1 as test
{% endif %}
{% endmacro %}
{% macro analyze(run) %}
{% if run == true %}
analyze
{% else %}
select 1 as test
{% endif %}
{% endmacro %}
models/admin/warehouse_operation.sql
This model is called specifically to run maintenance jobs. While it will always be run with your project on a standard dbt run
, it won't actually perform maintenance tasks unless the appropriate variable is passed in from the command line. As a convenience, it will log all runs of itself and track whether or not that run caused it to actually run the maintenance tasks.
{{
config({
"materialized" : 'incremental',
"sql_where" : 'TRUE',
"post-hook" : [
after_commit("{{ vacuum( var('maintenance', false) ) }}"),
after_commit("{{ analyze( var('maintenance', false) ) }}")
]
})
}}
select
current_timestamp as run_at,
{{var('maintenance', false)}} as maintenance_jobs_run
dbt run --models warehouse_operation --vars '{ "maintenance": true }'
This command can be run locally or in Sinter.
Your mileage may vary, but here are some good default recommendations for implementation:
- Schedule these jobs to run once a week on weekends.
- Make sure your Sinter schedule doesn't cause other jobs to be run at the same time as this maintenance job.
- If you're running maintenance tasks on an existing cluster for the first time, babysit them manually. Frequently, the first time these tasks are run on a cluster they take significantly longer because the cluster requires more work to be done. Once you're in a known good state, weekly jobs can be automated.