swyxio/gist:135136c1217b038e4b897415845e8150

## gistfile1.txt
1. Introduction

2. What are Data Nets?

3. Data Nets vs. Data Mesh

4. Data Nets vs. Data Contract

5. When do you need a Data Net?

6. What does a Data Net architecture look like?

7. Main technological and cloud data warehousing trends

8. Organizational and socio-technical adjustments

9. Core principles of the Data Net


---


CONCEPT 1: DATA ORCHESTRATION

Data Orchestration models dependencies between different tasks in heterogeneous environments end-to-end. It handles integrations with legacy systems, cloud-based tools, data lakes, and data warehouses. It invokes computation, such as wrangling your business logic in SQL and Python and applying ML models at the right time based on a time-based trigger or by custom-defined logic. Data consumers, such as data analysts, and business users, care mostly about the production of data assets. On the other hand, data engineers have historically focused on modeling the dependencies between tasks (instead of data assets) with an orchestrator tool. How can we reconcile both worlds?This article reviews open-source data orchestration tools (Airflow, Prefect, Dagster) and discusses how data orchestration tools introduce data assets as first-class objects. We also cover why a declarative approach with higher-level abstractions helps with faster developer cycles, stability, and a better understanding of what’s going on pre-runtime. We explore five different abstractions (jobs, tasks, resources, triggers, and data products) and see if it all helps to build a Data Mesh. What makes an orchestrator an expert is that it lets you find when things are happening (monitoring with lots of metadata), what is going wrong and how to fix the wrong state with integrated features such as backfills.

In the end, an orchestrator must activate Business Intelligence, Analytics, and Machine Learning. These are company-accessible dashboards/reports, machine learning models, or self-serve BI environments where users can create and pull their data. It is also where the shift happens from data pipelines to what the user is interested in, the Data Asset or Data Product, to use the jargon of Data Mesh. I will use orchestration as a synonym for data orchestration, as all we talk about in this article is data. As well, I use Data Assets interchangeably with Data Products.
	1. Introduction

	2. What are Data Nets?

	3. Data Nets vs. Data Mesh

	4. Data Nets vs. Data Contract

	5. When do you need a Data Net?

	6. What does a Data Net architecture look like?

	7. Main technological and cloud data warehousing trends

	8. Organizational and socio-technical adjustments

	9. Core principles of the Data Net


	---


	CONCEPT 1: DATA ORCHESTRATION

	Data Orchestration models dependencies between different tasks in heterogeneous environments end-to-end. It handles integrations with legacy systems, cloud-based tools, data lakes, and data warehouses. It invokes computation, such as wrangling your business logic in SQL and Python and applying ML models at the right time based on a time-based trigger or by custom-defined logic. Data consumers, such as data analysts, and business users, care mostly about the production of data assets. On the other hand, data engineers have historically focused on modeling the dependencies between tasks (instead of data assets) with an orchestrator tool. How can we reconcile both worlds?This article reviews open-source data orchestration tools (Airflow, Prefect, Dagster) and discusses how data orchestration tools introduce data assets as first-class objects. We also cover why a declarative approach with higher-level abstractions helps with faster developer cycles, stability, and a better understanding of what’s going on pre-runtime. We explore five different abstractions (jobs, tasks, resources, triggers, and data products) and see if it all helps to build a Data Mesh. What makes an orchestrator an expert is that it lets you find when things are happening (monitoring with lots of metadata), what is going wrong and how to fix the wrong state with integrated features such as backfills.

	In the end, an orchestrator must activate Business Intelligence, Analytics, and Machine Learning. These are company-accessible dashboards/reports, machine learning models, or self-serve BI environments where users can create and pull their data. It is also where the shift happens from data pipelines to what the user is interested in, the Data Asset or Data Product, to use the jargon of Data Mesh. I will use orchestration as a synonym for data orchestration, as all we talk about in this article is data. As well, I use Data Assets interchangeably with Data Products.