Skip to content

Instantly share code, notes, and snippets.

@liopic
Last active June 13, 2020 10:16
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save liopic/ed6963ed170019302c1775d260a01322 to your computer and use it in GitHub Desktop.
Save liopic/ed6963ed170019302c1775d260a01322 to your computer and use it in GitHub Desktop.
DataEngConf 2018

DataEngConf 2018 Barcelona

Data Lake

  • Health metrics (performance, bottlenecks)
  • Metadata: tables size, num (min, max, ...), string (distinct, common)
  • Costs: which models are expensive, why real time?
  • Dashboard for monitoring
  • Query annotations
  • Data validation + anomalies

Mid-Sized Scenario

  • Yara Fertilizers at.farm
  • Ourworldindata.org

Culture

  • Correct ratio of data engineers vs scientists (2-1)
  • DataEng democratizes data
  • tiny.dbi.io/detbook
  • Dataflow simplifies? Easier but not simple

Presto

  • Jupiter: AAS with branches (no data stored)
  • Vars as ENV

Feat. Platform

  • Bigtable R/W speed
  • Redis secondary

Dataland

  • Nowcasting vs. forecasting
  • word2vec with journeys
  • Taxis 30% use time vs. Cabify 55%

Event Driven

  • Tracking + domain events
  • Realtime patterns: event + wait time = action
  • Flink + Airflow
  • Precomputed tables
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment