When you dbt compile
, dbt at a very high level does this:
- read your dbt_project.yml in, rendering most fields with jinja (hooks and query comments are deferred until later, when they can have more information available). Any projects in your modules directory are also read in and rendered - consider them part of "your project" when reading.
- read your profiles.yml in, rendering everything with jinja
- find all the relevant files (.sql, .yml) and read them in, as defined in your dbt_project.yml
- dbt renders each sql file with jinja, primarily collecting calls to
ref
,source
, andconfig
. The string result of the actual rendering is then discarded. The model's materialization type is finalized here, as are any other relevant model-level configuration items (database/schema/alias are easy examples here!). - build a dependency graph using the
ref
information and use the command-line arguments to decide which nodes to iterate over - for each selected node, in "graph order":
a) "compile the node" by rendering the jinja and collecting the resulting sql into a string - this is what's written to target/compiled
. The result of rendering here is stored
b) If you're running dbt run
, render another jinja document, the materialization, with the sql generated in the previous step as "sql".
on-run-(end/start) hooks are rendered in jinja and run before and after step 6 as appropriate. model hooks are rendered and run before and after step 6b as appropriate. Neither have a real "parse" phase, they basically go straight to compiling.
Ephemeral models are treated specially. During compilation dbt converts them into a partial CTE (including a name), and their execution is skipped. During compilation, models that depend upon ephemeral models have the appropriate with
statements their dependent models looked up their results and use that information to create a CTE.
When dbt parses schema.yml files and finds tests, it creates some jinja that calls the relevant macro (the test name with test_
prefixed). This includes a ref
to any referenced model. After that, dbt treats tests schema and data tests the same: it executes the SQL and looks for normally, going through the parsing/compiling steps and then executing that SQL. The result is expected to have exactly one row with exactly one column, which should be the number of rows that failed the test.
Almost all of dbt's concurrency happens in step 6. The on-run-start and on-run-end hooks are deliberately not concurrent, as they can very reasonably have dependencies.
In your model, you may need to react slightly differently in parsing vs compiling: for example, you might have a log statement that should run at runtime, but makes no sense at parse-time. The execute
value is provided for that purpose: it's False
in parsing and True
during compilation. You can use it to wrap your statements, like so:
{% if execute %}
{{ log('executing the model', info=False) }}
{% endif %}
Note that when execute=False
, things can be a little funky! Because dbt doesn't know what your final database/schema/identifier values are, it fills them with dummy values that could be completely wrong.
Definitely don't use execute
to change the shape of the graph between parsing and runtime by choosing ref
targets based on the value: dbt will do bad stuff like drop .. cascade
relations that have dependencies and break those dependent models.
There's no way for dbt to skip files in your models directory for parsing. Because you can enable/disable models with config(enabled=...)
, dbt can't tell if a model is enabled until it's been parsed. Instead, to avoid parsing a bad file, you can rename the file to not have a .sql
extension, or you can move it out of your models directory.
The cache is populated in run
, seed
, snapshot
, and test
just before on-run-start hooks run. It enumerates all the databases and schemas referenced by your project, and collects information about what currently exists and its table type. This informaiton is relevant for materializations, which might have to decide to drop a view and create a table in its place for an incremental
model that used to be a view
, for example.