With increasing scalability comes the need to use strategies and approaches to ensure the availability of our services. Here I list some very important tools and concepts to achieve these goals.
Most of these tools are multifunctional.
- Metrics refer to a numeric representation of data measured over time.
- Logs, a record of an event that took place at a given timestamp, also provide valuable context regarding when a specific event occurred.
- Traces represent causally related events in a distributed environment.
- Elastic Beats
- Fluentd
Introducing Soda Core: The New Way for Data Reliability
- Delta lake archtecture ➡️ bronze(raw) ⇒ silver(treated) ⇒ gold(final)
- Data Mesh concept
Data scraping vs Data crawling "Data Crawling means dealing with large data sets where you develop your crawlers (or bots) which crawl to the deepest of the web pages. Data scraping, on the other hand, refers to retrieving information from any source (not necessarily the web)." By Trifacta
https://github.com/joelparkerhenderson/architecture-decision-record