Skip to content

Instantly share code, notes, and snippets.

View rupeshtiwari's full-sized avatar
🎯
Focusing

Rupesh Tiwari rupeshtiwari

🎯
Focusing
View GitHub Profile
@rupeshtiwari
rupeshtiwari / 00_GCP Data Services.md
Last active April 25, 2024 00:44
GCP Data Services

Here's the table sorted chronologically based on the release date of each Google Cloud service:

Google Cloud Service Release Date Based on/Open-source Inspiration Open-source Start Date Notes
Google BigQuery 2010 Dremel (Internal Google Tech) N/A BigQuery is inspired by Dremel but is not directly based on open-source technology.
Google Cloud Dataflow 2014 Apache Beam 2016 (as Apache Beam) Initially developed by Google as Google Dataflow, then donated to the Apache Software Foundation as Apache Beam.
Google Cloud Composer 2018 Apache Airflow 2015 Developed by Airbnb and later open-sourced as Apache Airflow, which Google adopted for Cloud Composer.
Google Data Fusion 2019 CDAP (Cask Data Application Platform) 2011
@rupeshtiwari
rupeshtiwari / 01_Scalable Triumph: Migrating FinTrust Bank's Data Infrastructure to the Cloud.md
Last active April 25, 2024 04:05
Use cases for Data Analytics Customer Story, GCP, AWS, customer story, use cases, real world

Data Warehouse Migration Story for FinTrust Bank

Framework Step Details
Situation
  1. FinTrust Bank, with an annual revenue of $12 billion, was facing a high-stakes challenge when its existing systems couldn't handle over 500 million transactions per month during a critical testing phase with a key e-commerce client.
  2. This client was projected to increase annual revenues by 15% ($1.8 billion).
  3. Key stakeholders involved were the client's CIO, CTO, and CSO, highlighting the strategic importance of the project.
Task The urgent task was to stabilize and scale the bank’s data processing capabilities to not only retain the e-commerce client but also to set a foundation for scalable, compliant growth suitable for high-volume transaction environments.
@rupeshtiwari
rupeshtiwari / 00_README.md
Last active April 23, 2024 20:40
Kubernetes from Basics to Guru
@rupeshtiwari
rupeshtiwari / 00_Data Architecture README.md
Last active April 25, 2024 00:45
All Apache Data Processing Frameworks and Tools

Comprehensive Overview of Hadoop Ecosystem Components with Cloud Service Equivalents

Here's a concise table summarizing the key Hadoop ecosystem components along with their cloud service equivalents:

Component Purpose Created by Language Support Limitations Alternatives Fit GCP Service AWS Service Azure Service
Apache Hive SQL-like data querying in Hadoop. Facebook HiveQL High latency for some queries. P
@rupeshtiwari
rupeshtiwari / 00_README.md
Last active April 23, 2024 19:06
Learning Apache spark notes
@rupeshtiwari
rupeshtiwari / 01_GCP for Data Analytics Customer Engineer.md
Last active April 23, 2024 20:40
GCP for Data Analytics, Google Data Analytics , Customer engineer, gcp,
@rupeshtiwari
rupeshtiwari / Overview of Open-Source Projects Related to Google's Technologies.md
Last active April 21, 2024 15:50
Overview of Open-Source Projects Related to Google's Technologies
@rupeshtiwari
rupeshtiwari / Elasticsearch to OpenSearch migration.md
Last active April 19, 2024 14:35
Elasticsearch to OpenSearch migration

1.5 to 6.8

flowchart TD
    ES1_5("1. Elasticsearch 1.5\nUse migration plugin and snapshot") -->|Snapshot & restore| ES2_3("2. Elasticsearch 2.3\nRestore snapshot from 1.5\nReview