The-Edgar/Scalable Analytics Engineering.md

## Scalable Analytics Engineering.md

      
    Raw
  

              Scalable Analytics Engineering.md
            
          
    Implementing Scalable Analytics Engineering Workflow with DBT (Data Build Tool)

Presenter

Edgar Rootalu
Data Science Consultant and Founder of Analytics Engineering consultancy bravetech.io
Data Scientist at The Future Society
Context

Thousands of companies these days are using a public data warehouse, BigQuery, Redshift, Snowflake or others. In other cases they'll use some solution on premise.
They also use business intelligence for reporting.
In more technology focused companies data scientists are using this data for predictive modeling to know for exampl:

user life time value and churn
product demand
predictive maintenance
fraud detection
cross and upselling

In mid-sized and larger companies (>50 employees), teams of analysts are creating data models for each purpose.
Data models start accumulating and at some point breaking.
Description

Practices that have been around in software engineering for a decade or more need to make it to the analysts.
In order to build scalable analytics workflows, data models need to be:

dependency-aware and documented
version controlled
automatically tested

Code

Open sourced implementation of how the Gitlab company has implemented DBT:
https://gitlab.com/gitlab-data/analytics/-/tree/master/transform/snowflake-dbt