Skip to content

Instantly share code, notes, and snippets.

@The-Edgar
Last active October 26, 2020 11:11
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save The-Edgar/5d41315460d26fa45c0e11b52e51c29a to your computer and use it in GitHub Desktop.
Save The-Edgar/5d41315460d26fa45c0e11b52e51c29a to your computer and use it in GitHub Desktop.
Show and Tell! ML/AI Project Lightning Talks - Edgar's talk

Implementing Scalable Analytics Engineering Workflow with DBT (Data Build Tool)

Presenter

Edgar Rootalu

Data Science Consultant and Founder of Analytics Engineering consultancy bravetech.io

Data Scientist at The Future Society

Context

Thousands of companies these days are using a public data warehouse, BigQuery, Redshift, Snowflake or others. In other cases they'll use some solution on premise.

They also use business intelligence for reporting.

In more technology focused companies data scientists are using this data for predictive modeling to know for exampl:

  • user life time value and churn
  • product demand
  • predictive maintenance
  • fraud detection
  • cross and upselling

In mid-sized and larger companies (>50 employees), teams of analysts are creating data models for each purpose.

Data models start accumulating and at some point breaking.

Description

Practices that have been around in software engineering for a decade or more need to make it to the analysts.

In order to build scalable analytics workflows, data models need to be:

  • dependency-aware and documented
  • version controlled
  • automatically tested

Code

Open sourced implementation of how the Gitlab company has implemented DBT: https://gitlab.com/gitlab-data/analytics/-/tree/master/transform/snowflake-dbt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment