1️⃣ The FinnHub Streaming Data Pipeline
- https://github.com/RSKriegs/finnhub-streaming-data-pipeline
- 💬 The project is a streaming data pipeline based on Finnhub.io API/websocket real-time trading data.
- 💻 Kafka, Spark, Cassandra, Kubernetes, Grafana
2️⃣ Streamify
- https://github.com/ankurchavda/streamify
- 💬 The project will stream events generated from a fake music streaming service (like Spotify) and create a data pipeline that consumes the real-time data
- 💻 Kafka, Spark Streaming, dbt, Docker, Airflow, Terraform, GCP
3️⃣ Reddit ETL Pipeline
- https://github.com/ABZ-Aaron/Reddit-API-Pipeline
- 💬 A data pipeline to extract Reddit data from r/dataengineering and provides a Google Data Studio report
- 💻 AWS S3/Redshift, dbt, Airflow, Docker, Terraform
4️⃣ Audiophile End-To-End ELT Pipeline
- https://github.com/ris-tlp/audiophile-e2e-pipeline
- 💬 Pipeline that extracts data from Crinacle's Headphone and InEarMonitor databases and finalizes data for a Metabase Dashboard.
- 💻 AWS S3, Redshift, RDS, dbt, Airflow
5️⃣ Surfline Dashboard
- https://github.com/andrem8/surf_dash
- 💬 The pipeline collects data from the surfline API and exports a csv file to S3. Then the most recent file in S3 is downloaded to be ingested into the Postgres datawarehouse. At the end, you obtain a beautiful dashboard showing the data
- 💻 AWS S3, Airflow, Pandas, Postgres, Ploty