Skip to content

Instantly share code, notes, and snippets.

@j-mprabhakaran
Created January 20, 2019 10:24
Show Gist options
  • Save j-mprabhakaran/2993ea834bb54b2e03bcb64d0034f5a5 to your computer and use it in GitHub Desktop.
Save j-mprabhakaran/2993ea834bb54b2e03bcb64d0034f5a5 to your computer and use it in GitHub Desktop.
GCP Cloud Architect - Part 3
GCP Cloud Architect - Part 3
Case Studies
Refreshed Nov 9th 2018; JencoMart completely dropped
Overview - 3 case studies; 40-50% on the exam; Question on one side, Case study on other side
Layout of Case study - 1.Company Overview 2.Solution concept - current goal 3.Existing Technical Env 4.Requirements(Tech/Business) 5.Executive statement
Mountkirk Games
Dress4Win
TerramEarth
Mountkirk Games
https://cloud.google.com/solutions/mobile/mobile-gaming-analysis-telemetry
Business Requirements:
Single Global HTTP LB
Pub/Sub, Datastore, BigQuery, Cloud Storage
Dataflow
Monitor with stackdriver
Multi-regional GCE backends; Multi-region Datastore
Technical Requirements:
Game Backend Platform
----------------------
Autoscaling Managed IG
Cloud Datastore
BigQuery
Cloud Dataflow
Managed IG - Custom Images
Game Analytics Platform
-----------------------
Autoscaling services
Connect services with Pub/Sub, process with Dataflow
Dataflow accounts for late/out of order data
BigQuery
Upload to storage; Process via Dataflow
Dress4Win
Executive Priorities:
Scale
Contain costs - improve TCO
Too many resources sitting idle
Solution concept:
Move Dev/Test Env to Google Cloud (seperate project for different env)
DR Site - Hybrid cloud/On-Premise Env; Connect over VPN
Business Requirements:
create equivalent setup on the cloud(Lift and Shift)
Principle of least privilege; separate test/dev env
Automate infra creation; gcloud/SDK; Rapid deployment(Deployment manager)
Stackdriver - monitor infra with stackdriver monitoring; notified of errors with stackdriver logging; troubleshoot with Debug/Error Reporting
Technical Requirements:
Best practices for migration; Move date first then applications
Deployment Manager, other IAC products
CI/CD pipeline; Jenkins etc
Failover - MySQL replicating to CloudSQL; on-premise/cloud app servers - DNS cutover
All data encrypted by default; customer supplied encryption
VPN
Databases: MySQL - CloudSQL(Native MySQL support; 10TB size limit; Single region)
Migration - create replica server managed by cloud SQL; once replica is synced, 1.Update app to point replica 2.promote replica to standalone instance
Redis 3 - 1.Run Redis server on compute engine 2.Use new Memorystore managed Redis database
40 web app server - Managed IG - autoscaling; Use custom machine types
20 Apache hadoop - Cloud Dataproc
3 RabbitMQ - 1.Pub/Sub 2.Deploy same env on CE IG
iSCSI and Fiber channel SAN - Persistent Disk working on SAN cluster
NAS - Cloud Storage
TerramEarth
Heavy equipment, mining, agriculture
500 dealers all over the world
mission = make customers more productive
Current Setup
Collect analytics on vehicles
Increase efficieny
Predict breakdown and pre-stage replacement parts
20 million vehicles - each collect 120 fields per second
Data stored locally, then uploaded(batch) when at dealer
200,000 use cellular connection - Always streaming data; 9TB per day total upload
Problem to solve
Turn around time 4 weeks - Needs be 1 week
Management priority - Business agility
Data Ingest - Data Warehouse - Analytics
GCP Approach 1 - increase cellular connectivity to higher %; Migrate FTP batch upload to streaming upload
Cloud IOT Core -> Cloud Pub/Sub -> Cloud Dataflow -> BigQuery -> Cloud Datalab or Datastudio
-> Cloud Fuctions
BigQuery -> Cloud ML -> Cloud Dataflow
GCP Apporach 2 - 100% local service center server(Batch)
Transfer via API ->Cloud Storage Regional Bucket -> Cloud Dataflow -> BigQuery -> Insights
BigQuery -> Cloud ML -> Cloud Dataflow -> Cloud Storage
Custom API for Dealers and partners; API Engine+Cloud Endpoints
Data transfer to GC - IOT Core
Planning Your Cloud Transition
Making the Case for the Cloud and GCP
Why move to cloud? 1.Cost 2.Future proof infra 3.Scale to meet demand 4.Greater business agility 5.Managed services 6. Global reach 7.security at scale
Cost Optimization - Sustained use discounts; Custom machine types(0.9-6.5GB RAM per CPU); Rightsizing recommendations(based on 8 days usage);
Preemptible VMs (FT, batch processing); Coldline storage(archive/DR, millisecond access); Commited use discounts;
Architecting Cloud Apps:
App design requirements - 5 principles: HA, Scalability, Security, DR, Cost
Migrating to Google Cloud
Planning a Successful Cloud Migration
Assess, Pilot, Move Data, Move Applications, Cloudify & Optimize
Assess - 3 categories 1.Easy to move 2.Hard to move 3.Can't move; Evalution criteria: Criticality of app, compliance, license, ROI; consider app dependencies
Pilot - POC/Test Run; non-critical or easily duplicated services; small steps at first; considerations - licensing, rollback plan, process changes
start mapping roles: projects, seperation of duties, test/prod env, VPC's
Move Data - Data before app, evaluate storage options, transfer methods (gsutil, transfer appliance, batch upload, storage transfer service, mysqldump)
Move apps - self service or partner assisted; Lift & shift recommended; VM IMPORT freely available options via CloudEndure; Hybrid, Backup as Migration
Optimize - Cloud makeover; retool processes and apps with modern GCP tools - 1.offload static assets to CS 2.Enable AS 3.Enhance redundancy with AZ
4.Enhanced monitoring with stackdriver 5.Managed services 6.Decouple stateful storage from app
Storage transfer service
1.Import online data(AWS S3, HTTP/(S) location, Another CS) into CS 2.Import from online data source to data sink(CS bucket)
Transfer operation configured thru transfer job; Requires owner or editor project IAM's role+access to source and sink
gsutil(on-premise) vs Storage Transfer Service(CSP - GCS, AWS, HTTP)
Migrating Applications
Migrating from on-premises=migrating servers; App migration=server migration; Map to GCP services
Before moving server, 1.Create a project 2.Determine network config(VPC) - Firewall, Region, subnets 3.Determine IAM roles
Lift and Shift (GCE public image, Import direct image)
Data Migration Best Practices
Cloud - Storage Transfer, On-premise - gsutil, slow network - "mail it in"
Make Data transfer easier - 1.Decrease data size 2.Increase network bandwidth (Direct peering/Cloud Interconnect)
gsutil - Multithreaded, parallel upload, resumable
Physical media - Google Transfer Appliance; 20TB or more data
Mapping storage solutions
Unstructured data - CS
Relational data - Cloud SQL or Spanner
Non-relational data - BigTable and Datastore
Big Data Analysis - BigQuery
In-memory database - MemoryStore
Other - Persistent disk
Cloud Solution Infrastructure
Preemptible VM's - Rendering, Media transcoding, Bigdata analytics;
Best practices - Use smaller machine types; run off-peak times; preserve disk on machine termination; use shutdown scripts
create and terminate machine to save costs, but preserve disk states: --no-auto-delete --disk example-disk
Managed IG with PVMs keep recreating every minute: Health check/Firewall configuration
Backup and Disaster Recovery - Backup individul instance(snapshots); Database backup(use cron job to backup data to CS or persistent disk);
CS backup/rollback(object versioning+lifecycle mgmt); Distributed computing app rollback(rolling update for managed IG/version control,split traffic for GAE);
Scheduled automated backup(cron jobs, apply to snapshot, database backup)
Rollback plan for managed IG serving website - 100's of instances - Obj versioning on static data in CS; Rolling updates; NOT snapshots
Backup critical database with zero downtime & min resource usage - Scheduled cron job; Backup database data to another location(CS, Persistent Disk)
App Engine, need to push risky update to live env - Versioning/Split traffic, canary update
Security
Methods of Security GCP Environment
Exam scenarios -
1.IG VM's keep restarting every minute - Failing health check/Configure firewall to allow proper IG VM's from LB IP
2.On-premise network access to proper network resources - Restrict ingress firewall access to on-premise network IP range
3.Failover from on-premise LB hosted app to GCP hosted IG - consider security & compliance; allow firewall access at IG from outside source
4.External SSH access disabled, but ops team need to remotely manage VM's - Give ops team access to cloud shell; not same scenario as removing external IP
Legal Compliance and Audits
Designing for LC & Audits, consideration include Legislation, Audits, Certification;
Audit, auditor, access logs, compliance, think Stackdriver logging
Billing data exported directly to CS/BigQuery
Automating/Exporting Logging data for audits - analysis(BigQuery), access to external parties(CS);
Analyze PCI data - PCI DSS securely handle credit card info; stream to BigQuery for analysis
Send Log data to BigQuery for analysis - Data travels from squid proxy to Stackdriver logging/monitoring; Export from stackdriver logging to BigQuery
Securely migrating Database data - migrate Database to Datastore; App Authentication oAuth 2.0-> Export Database info to CS -> import into Datastore
if migrating via app/API = Authenticate with oAuth 2.0 with service account and export to GCS
if exporting as simple copy = gsutil copy to GCS
Development Practices
SDLC, CI/CD, Blue/Green model for deployment, Application microservices
SDLC - produces s/w with high quality, lowest cost, shortest time; Plan to develop, alter, maintain, and replace software system; Env seperate; sep projects
CI/CD - CI:Integrate code into main branch of shared repo early and often; minimize cost of integration; CD:Focus on automating software delivery process
automatically deploy each build that passes full lifecycle; GCP Container Builder, Jenkins, Spinakar
Blue/Green Deployment
Break monolith to micro services - reduce unplanned rollbacks due to errors, what best practices? Blue-Green Model, break monolith into microservices
Application Error Examples
Java digest error - Re-sign JAR file
News Mobile App caching under load, Needs to prevent caching - Overwrite Datastore entries; Set app to work from snigle instance; Modify API to prevent
caching; (set HTTP cache flag to -1)
Data Flow Lifecycle
Data Flow - Putting the Pieces Together
Managing Data's life cycle - Bigdata Focus, 4 stages: Ingest, Store, Process and Analyze, Explore and Visualize
Ingest - GAE, GCE, GKE, Cloud Pub/Sub, Stackdriver Logging, Cloud Transfer Service, Transfer Appliance
Store - Cloud Storage, Cloud SQL, Cloud Datastore, Cloud BigTable, BigQuery, CloudSpanner, CS for Firebase, Cloud Firestore
Process & Analyze - Cloud Dataflow, Dataproc, BigQuery, ML, Vision API, Speech API, Translate API, Natural Language API, Dataprep, Video Intelligence API
Explore & Visualize - Datalab, Datastudio, Google Sheets
Structured -> Transactional(CloudSQL, CloudSpanner), Analytical(BigQuery)
Semi-structured -> Fully indexed(Cloud Datastore), Row Key(Cloud BigTable)
Unstructured -> Cloud Storage
Cloud Dataproc - Existing Hadoop/Spark App; ML/DS Ecosystem; Tunable cluster parameters
Cloud Dataflow - New data processing pipelines, Unified streaming & batch, Fully managed
Cloud Dataprep - UI-Driven preparation, Scaled on-demand, Fully managed
Data Flow Hands-On and Reference Material
gcp.solutions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment