Skip to content

Instantly share code, notes, and snippets.

View rupeshtiwari's full-sized avatar
🎯
Focusing

Rupesh Tiwari rupeshtiwari

🎯
Focusing
View GitHub Profile
@rupeshtiwari
rupeshtiwari / 01_usecase Scalable Triumph: Migrating FinTrust Bank's Data Infrastructure to the Cloud.md
Last active April 26, 2024 14:59
Use cases for Data Analytics Customer Story, GCP, AWS, customer story, use cases, real world, usecase

Data Warehouse Migration Story for FinTrust Bank

Framework Step Details
Situation
  1. FinTrust Bank, with an annual revenue of $12 billion, was facing a high-stakes challenge when its existing systems couldn't handle over 500 million transactions per day during a critical testing phase with a key e-commerce client.
  2. This client was projected to increase annual revenues by 15% ($1.8 billion).
  3. Key stakeholders involved were the client's CIO, CTO, and CSO, highlighting the strategic importance of the project.
Task The urgent task was to stabilize and scale the bank’s data processing capabilities to not only retain the e-commerce client but also to set a foundation for scalable, compliant growth suitable for high-volume transaction environments.
@rupeshtiwari
rupeshtiwari / 00_Data Architecture README.md
Last active April 25, 2024 17:17
All Apache Data Processing Frameworks and Tools
@rupeshtiwari
rupeshtiwari / 00_GCP Data Services.md
Last active April 25, 2024 14:57
GCP Data Services, GCP services, Google services, Google

Here's the table sorted chronologically based on the release date of each Google Cloud service:

Google Cloud Service Release Date Based on/Open-source Inspiration Open-source Start Date Notes
Google BigQuery 2010 Dremel (Internal Google Tech) N/A BigQuery is inspired by Dremel but is not directly based on open-source technology.
Google Cloud Dataflow 2014 Apache Beam 2016 (as Apache Beam) Initially developed by Google as Google Dataflow, then donated to the Apache Software Foundation as Apache Beam.
Google Cloud Composer 2018 Apache Airflow 2015 Developed by Airbnb and later open-sourced as Apache Airflow, which Google adopted for Cloud Composer.
Google Data Fusion 2019 CDAP (Cask Data Application Platform) 2011
@rupeshtiwari
rupeshtiwari / Data Analytics Late Arrival Handling.md
Last active April 25, 2024 14:21
Data Analytics Late Arrival Handling

Techniques to handle late arrival

Watermarks and Allowed Lateness are both vital techniques in managing late data in stream processing systems, but they serve slightly different purposes and are often used in conjunction to maximize data integrity and processing efficiency. Here’s an in-depth look at when and why you might choose to use each technique, or both together, along with real-world industry examples.

Watermarks

Purpose: Watermarks are primarily used to handle out-of-order data. They provide a way to estimate the "completeness" of data up to a certain point in time, based on event timestamps.

When to Use: Use watermarks when:

  • You expect data to arrive out of order.
  • You need a mechanism to know when to close a window and process its data.
@rupeshtiwari
rupeshtiwari / 01_GCP for Data Analytics Customer Engineer.md
Last active April 23, 2024 20:40
GCP for Data Analytics, Google Data Analytics , Customer engineer, gcp,
@rupeshtiwari
rupeshtiwari / 00_README.md
Last active April 23, 2024 20:40
Kubernetes from Basics to Guru
@rupeshtiwari
rupeshtiwari / 00_README.md
Last active April 23, 2024 19:06
Learning Apache spark notes

Comprehensive Overview of Hadoop Ecosystem Components with Cloud Service Equivalents

Here's a concise table summarizing the key Hadoop ecosystem components along with their cloud service equivalents:

Component Purpose Created by Language Support Limitations Alternatives Fit GCP Service AWS Service Azure Service
Apache Hive SQL-like data querying in Hadoop. Facebook HiveQL High latency for some queries. P
@rupeshtiwari
rupeshtiwari / Overview of Open-Source Projects Related to Google's Technologies.md
Last active April 21, 2024 15:50
Overview of Open-Source Projects Related to Google's Technologies