Skip to content

Instantly share code, notes, and snippets.

@sgoley
Forked from jthandy/vacuum-analyze-sinter.md
Last active August 24, 2020 13:17
Show Gist options
  • Save sgoley/289f022ddeb74519497844f4b7139e39 to your computer and use it in GitHub Desktop.
Save sgoley/289f022ddeb74519497844f4b7139e39 to your computer and use it in GitHub Desktop.
Running vacuum and analyze on Redshift via dbtCloud

Running vacuum and analyze in dbtCloud

dbt and dbtCloud have the ability to run regular Redshift maintenance jobs. It's great to set these up early on in a project so that things stay clean as the project grows, and implementing these jobs in dbtCloud allows the same easy transparency and notifications as with your other dbt jobs.

This document will go through the specific steps necessary to configure vacuum and analyze jobs in the current version of dbt and dbtCloud. In the future, there will likely be a more idiomatically consistent way to express this logic using native dbt operations. Currently, this does work even if it is not elegant.

Step 1: Create the macros

macros/redshift_maintenance.sql

These macros will get called in a special operation model and are responsible for actually writing the SQL for the maintenance commands. While it is possible to write much more specific vacuum and analyze commands, often it is enough to simply execute the their default behaviors.

{% macro vacuum(run) %}
    {% if run == true %}
        vacuum
    {% else %}
        select 1 as test
    {% endif %}
{% endmacro %}

{% macro analyze(run) %}
    {% if run == true %}
        analyze
    {% else %}
        select 1 as test
    {% endif %}
{% endmacro %}

Step 2: Create the model

models/admin/warehouse_operation.sql

This model is called specifically to run maintenance jobs. While it will always be run with your project on a standard dbt run, it won't actually perform maintenance tasks unless the appropriate variable is passed in from the command line. As a convenience, it will log all runs of itself and track whether or not that run caused it to actually run the maintenance tasks.

{{
    config({
        "materialized" : 'incremental',
        "sql_where" : 'TRUE',
        "post-hook" : [
          after_commit("{{ vacuum( var('maintenance', false) ) }}"),
          after_commit("{{ analyze( var('maintenance', false) ) }}")
         ]
    })
}}

select
    current_timestamp as run_at,
    {{var('maintenance', false)}} as maintenance_jobs_run

Step 3: Run the project

dbt run --models warehouse_operation --vars '{ "maintenance": true }'

This command can be run locally or in dbtCloud.

Recommendations

Your mileage may vary, but here are some good default recommendations for implementation:

  • Schedule these jobs to run once a week on weekends.
  • Make sure your dbtCloud schedule doesn't cause other jobs to be run at the same time as this maintenance job.
  • If you're running maintenance tasks on an existing cluster for the first time, babysit them manually. Frequently, the first time these tasks are run on a cluster they take significantly longer because the cluster requires more work to be done. Once you're in a known good state, weekly jobs can be automated.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment