Data Science Consultant and Founder of Analytics Engineering consultancy bravetech.io
Data Scientist at The Future Society
Thousands of companies these days are using a public data warehouse, BigQuery, Redshift, Snowflake or others. In other cases they'll use some solution on premise.
They also use business intelligence for reporting.
In more technology focused companies data scientists are using this data for predictive modeling to know for exampl:
- user life time value and churn
- product demand
- predictive maintenance
- fraud detection
- cross and upselling
In mid-sized and larger companies (>50 employees), teams of analysts are creating data models for each purpose.
Data models start accumulating and at some point breaking.
Practices that have been around in software engineering for a decade or more need to make it to the analysts.
In order to build scalable analytics workflows, data models need to be:
- dependency-aware and documented
- version controlled
- automatically tested
Open sourced implementation of how the Gitlab company has implemented DBT: https://gitlab.com/gitlab-data/analytics/-/tree/master/transform/snowflake-dbt