Skip to content

Instantly share code, notes, and snippets.

@lsdr
Created June 14, 2019 19:27
Show Gist options
  • Save lsdr/b2f672500b1711c098526d9541785396 to your computer and use it in GitHub Desktop.
Save lsdr/b2f672500b1711c098526d9541785396 to your computer and use it in GitHub Desktop.
Data Engineering Howto

How To Become a Data Engineer

Useful articles

Algorithms & Data Structures

SQL

Programming

Databases

Distributed Systems

Books

Courses

Blogs

  • Martin Kleppmann author of Designing Data-Intensive Application
  • BaseDS by Vaidehi Joshi about Distributed Systems

Tools

  • Apache Airflow is a platform to programmatically author, schedule and monitor workflows in Python
  • Apache Spark is a unified analytics engine for large-scale data processing
  • Apache Kafka is a distributed streaming platform
  • Luigi is a Python package that helps you build complex pipelines of batch jobs.

Cloud Platforms

Other

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment