Skip to content

Instantly share code, notes, and snippets.

@j-mprabhakaran
Created January 16, 2019 12:50
Show Gist options
  • Save j-mprabhakaran/d9ec2280304ad645fa2513ee1962dc96 to your computer and use it in GitHub Desktop.
Save j-mprabhakaran/d9ec2280304ad645fa2513ee1962dc96 to your computer and use it in GitHub Desktop.
Google Certified Cloud Architect Part 1 Notes from Linux Academy
Google Certified Professional Cloud Architect - Part 1
======================================================
GCP Overview
Google's suite of cloud computing services; run on same infra and network as google
Compute -> App Engine, Container Engine, Compute Engine
Storage -> Bigtable, Cloud Storage, Cloud SQL, Cloud Datastore
Big Data -> BigQuery, Pub/Sub, Dataflow, Dataproc, Datalab
Machine Learning -> Vision API, Machine Learning, Speech API, Translation API
https://cloud.google.com/pricing
Per second pricing for instances;Private Global Fiber network;Live migration of VMs;Better performance;Industry leading security;access to innovative resources(Big data,ML)
Datacenter Infrastructure
https://cloud.google.com/about/locations
Datacenter -> 9 Regions and 27 Zones -> 17 Regions and 52 Zones
Backbone -> High speed private fiber network
Point of presence -> 90+ edge points in 33 countries
Commitment to social responsibility
Network Infra - Three elements 1) Core Data centers 2) Edge Points of presence(PoPs) 3) Edge Nodes
Edge Nodes - Tier of Google's infra close to end users; Youtube and Playstore cached in Edge nodes
GCP Free Trial Limitations:
Compute Engine - 8 cores running at once, 100GB SSD, 2TB Persistent standard disk space
Always Free Tier limits - Only for US regions; 5GB Cloud Storage per region; 1 f1-micro per month(US-region), 30GB HDD/5GB per month snapshot
https://cloud.google.com/free/docs/always-free-usage-limits
Organization:
Cloud Resource Hierarchy
Hierarchy Overview: 1) Organization (N/A for individual accounts) 2) Projects 3) Resources
Projects - Core Org component of GCP; Control access to resources; Creating, enabling, and using all Cloud Platform Services
Projects attributes -> Project Name, Project ID(Unique Application ID), Project Number (used by service accounts)
IAM:
Importance of resource access and management - who, can do what, on which resource -> to provide granular access, prevent unwanted access to other resources; least privilege
Cloud IAM - Members are granted permissions and roles to GCP services using the principle of least privilege.
Member - Person or Service Account -> People (Google account, Google group, G Suite domain, Cloud Identity domain) -> Service Account (Application access)
Service Account - special type of google account; identity for carrying out server-to-server interactions in a project; identified with email project_id@developer.gserviceaccout.com
Role - Collection of permissions to give access to a given resource; Permissions represented in form <service>.<resource>.<verb> Ex: compute.instances.delete
Users are assigned roles (not directly assigned permissions); 1) primitive 2) predfined
Primitive - Historically available GCP roles; Applied to project level; Broad roles (Viewer/Editor/Owner)
Primitive roles suitable for small teams, only broader permissions for project
Predefined or Curated - Granular access, granted at resource level. Ex: App Engine Admin, multiple predefined roles given to individual resources
https://cloud.google.com/iam/docs/understanding-roles#predefined_roles
IAM policy - Collection of roles; full list of roles granted to a member or resource
Policy Hierarchy - Org->Project->Resources-parent/child format; each child has one parent; children inherit parent roles, parent policies overrule restrictive child policies
Interacting with Google Cloud Platform
1.Cloud Console
2.Google Cloud SDK - CLI for managing resources and applications; gcloud (many common GCP tasks), gsutil (cloud storage), bq(work with data in BigQuery)
3.RESTful API
https://cloud.google.com/sdk
Cloud Shell - interactive web based shell environment accessed from web console; includes temp compute engine VM, 5GB persistent storage, pre-installed SDK and other tools,
built-in authorization
1 hour timeout for inactivity,
RESTful API - programmatice access to GCP resources; use JSON; Use OAUTH 2.0 for authentication and authorization; enabled via GCP console; API have daily quota; can experiment with API explorer
gcloud [GROUP] [GROUP] [COMMAND] - arguments
Ex: gcloud compute instances create instance-1 --zone us-central1-a
gcloud config set project [PROJECT_ID]
https://cloud.google.com/sdk/gcloud/reference
gcloud config list
gcloud projects list
API Explorer
Used for interacting with GCP via your own applications
https://developers.google.com/apis-explorer
Compute Options Comparison
Method of hosting apps on GCP - 1) Google Compute Engine, 2) Google App Engine, 3) Google Container Engine 4) Google Cloud Functions
GCP Architect: Given a set of business and technical requirements, know how to choose and implement the right tool for the task.
https://cloudplatform.googleblog.com
Compute Options Decision Tree
https://cloudplatform.googleblog.com/2017/07/choosing-the-right-compute-option-in-GCP-a-decision-tree.html
Google Compute Engine:
----------------------
CE is IaaS running virtual machines(instances) on Google Infra.
Robust Network Features: Custome Networks, Firewall rules, Regional HTTP(s) Load balancing, Network Load balancing, Subnetworks
Boot quickly, local SSD performance
Low cost+Automatic Discounts, minute level increments(10 min minimum charges), Automatic sustained discounts
Global private fiber network
Extremely flexible
Basic Instance Management:
Create, stop, start, reset and Delete instance via console
Instance Options:
gCloud and REST Reference:
gcloud compute instances create "test-2" --zone "us-central1-a" --machine-type "f1-micro"
Connect to a Linux Instance and More gcloud Commands
gcloud compute --project "compute-engine-overview" instances create "linux-instance" --zone "us-central1-a"
SSH in to the instance and do graceful shutdown via "sudo poweroff" command
IAM -> NETWORKING -> FIREWALL RULES
Connecting to Windows Instances
freerdp or Remmina.org => Free RDP Clients
prabhakaranaquarius
Nno:D+gHnKzu}.^
Editing Instance specifications:
Change instance type only when the instance is stopped, cannot change zone once instance is allocated
Creating, Editing, and Manipulating Disks:
Edit disk and increase size but cannot decrease
can create additional disk, but cannot change zones after creating
Windows Instance -> Change disk size, RDP and do Extend Volume on Disk management
Linux Instance -> Change disk size, SSH and run "df -h"
https://cloud.google.com/compute/docs/disks/add-persistent-disk
Snapshots and Images:
Snapshots - Backup and Disaster recovery, Persistent even while they are attached to running instances, lower cost than custom image, differential backups
Available only in the project they are created
Images - create instances(modified root volume), available across projects, stop the instance first to create custom image
Preemptible VMs, Instance Templates, and Groups:
Preemptible VMs - short lived, low cost VM, 80% cheaper, max lifetime of 24 hrs, suitable for short term batch processing
Instance group - Group of VMs, Managed and Unmanaged, Autoscaling, Instance templates define and deploy the group, multiple instance acting as once
Cloud Launcher(Marketplace):
quickly deploy functional software packages that run on GCP, free, manage and view info with deployment manager
Networking Overview:
VPC Networks - virtual version of traditional physical network, limited to internal resources, resources in VPC get internal ip from subnet
External IP - Ephemeral or static, static IP can be reserved(Prod servers)
Firewall rules - Every VPC = Managed Firewall(Ingress/Egress traffic), can be applied to entire VPC or individual instances
Routes - mapping IP range to destination, default or custom rules
Load balancing - distribute user requests among set of instances, works with instance group, used for AS, Batch, distribute traffic, FT
Cloud DNS - high performace, resilient, global DNS service that publish your domain names to global DNS in cost effective way. Create managed zones,
(add,edit and delete DNS records)
VPN - on premise ot GCP thru IPSec, private internal connection over public internet, supports gateway to gateway connections(site-site)
Cloud Router - dynamic routing for Google VPN, managed service for handling route
Cloud CDN - places online content closer to users for fast response time, content cached in 80+ edge cache sites around the globe.
VPC:
subnets can span multiple zones, network can span multiple regions
IP Addressing and Firewall Rules:
All instance comes with private IP based on subnet, Public IP enabled by default, (Ephemeral/Static), Instance can have one external address,
Unassigned static $.01 per hour(1 cent)
Firewall rules - protect resource from unapproved network connections, allow or deny, IP addresses, Port or protocol
Operation and Management Tools:
Google Stackdriver - provides powerful monitoring, logging and diagnostics for cloud operations, Natively monitor GCP, AWS or hybrid of both
Stackdriver components - 1.Monitoring 2.Logging 3.Error Reporting 4.Debugger 5.Trace
Deployment Manager - IaaS, uses YAML, repeatable deployment process, declarative approach, template driven
Source Repositories - Git repo hosted on GCP, built in source code editor, integrate with Stackdriver Debugger, connect to GitHub or BitBucket
Google App Engine:
PaaS - Focus on App dev, Managed Infra, Pay per use vs Pay per allocation
App Engine is GCP's tool to build modern web and mobile app on an open cloud platform, fully managed, support custom labguages, no vendor lock in
gcloud app deploy -
App Engine - Standarad Environment and Flexible Environment
Standard Env - runs in secure, sandboxed env, cannot write to local file system or modify runtime, pricing based on instance hours
Flexible Env - Based on Compute Engine, more customization, more native support, pricing based on CPU, memory and disk usage
App Engine is regional and available in selected regions
Use cases: Nitendo and DeNA - Super Mario Run
Deploy sample python app called BookShelf - App Engine Standard Runtime Env
https://codelabs.developers.google.com/codelabs/cp100-app-engine
1. 'git' the code
2. review the code
3. install requirements
4. deploy(gcloud app deploy)
gcloud init
gcloud source repos clone default --project=bookshelf-project-228207
cd default
git push -u origin master
git pull https://github.com/GoogleCloudPlatformTraining/cp100-bookshelf
cd app-engine
pip install -r requirements.txt -t lib
gcloud app deploy
Beginning deployment of service [default]...
╔════════════════════════════════════════════════════════════╗
╠═ Uploading 235 files to Google Cloud Storage ═╣
╚════════════════════════════════════════════════════════════╝
File upload done.
Updating service [default]...done.
Setting traffic split for service [default]...done.
Deployed service [default] to [https://bookshelf-project-228207.appspot.com]
You can stream logs from the command line by running:
$ gcloud app logs tail -s default
To view your application in the web browser run:
$ gcloud app browse
App Engine Resources:
Versions
Instances
Datastore
Storage
Cloud Endpoints:
Create, deploy, protect, monitor, analyze and server your APIs
APIs - Standardize interface for developers to build apps Ex: Google Drive API
Using Cloud Endpoints - Build your own API in App Engine Std, Expose API using RESTful interface, Oauth 2.0 authorization, Supports python and Java
Google Cloud Storage Options:
Bigtable(Analytics)(Non-relational)
Datastore(Non-relational)
Firestore
Storage(Unstructured)
SQL(Relational)
Spanner(new category)(Horizontal scalability)
Memorystore
Filestore
SQL - Consitency(Based on ACID), NoSQL - scalability/flexibility
Database Options - Closer Look:
Cloud SQL - Create instance/region/size, hosted mysql service, vertical scaling(read/write), horizontal scaling(read), Limited scalability
Datastore - Born as App Engine repo, Scale and Flexibility, Fully managed, scale from 0 to TB of data, Cost efficient, support ACID txns
Bigtable - managed NoSQL analytics(TB to PB), Hosted version of Google's own internal Bigtable technology, High vol write, millisec latency,
pricier than datastore.
CloudSpanner - RDBMS with better scalability(Horizontal), Billed as best of both worlds(NewSQL)
Use cases:
CloudSQL - web framework, CMS, eCommerce
Datastore - user profiles, product catalog, game states
Bigtable - high throughput analytics, IOT, Adtech
Spanner - scale+consistency, financial service, global supply chain
Cloud Storage(Unstructured Data):
Ex: pictures, videos, obj, docs, multimedia etc (BLOB), Integrates with Compute Engine, App Engine, BigQuery, CloudSQL
Unified Object storage, price competitive, pay per use, high scalable, multiple storage classes based on storage needs,
Not FS but can be setup as FS using 3rd party, data encrypted in transit and at rest
Bucket(basic container, cannot nest buckets, name shd be unique), Objects(files, 5TB per single file), Data opacity(no knowledge of structure)
Storage classes - Multi-regional(geo redundant), Regional(geographic), Nearline(low cost per GB, 30 days min, Infrequent access), Coldline (lower, 90 days, Cold data)
(all has same throughput, low latency, high durability) 99.95%/99.9%/99%/99% available
Storage cost - 26$/20$/$10/$7 per TB per month
gsutil mb -l us-central1 gs://bookshelf-prabha
gsutil defacl ch -u AllUsers:R gs://bookshelf-prabha
Hands-On - Cloud Storage with Third Party Application:
Cloudberry Backup - For Backup and Restore
create a bucket with nearline storage class
Use Home Edition Free
Google Container Engine
What are Containers and Kubernetes?:
Docker
Kubernetes("Helmsman")-Open source container manager
Master - Control K8s nodes
Node - Machines that performs tasks. Controlled by K8S master.
Pod - Group of 1 or more containers in a node.
Replication Controller - ensures specified no of pod replicas are running at any one time across nodes
kubectl - CLI tool for K8S
GKE:
fully managed env for deploying containerized apps, use GCE resources, Self healing, Autoscaling, Powered by K8S, Custom OS(Container Optimized OS)
K8S the hard way - https://github.com/kelseyhightower/kubernetes-the-hard-way
GKE Organization/Components - Container Cluster, K8S Master, Pods, Nodes, Replication controller, Services, Container Registry
Use case: Niantic - Pokemon Go
Reference for demo: https://codelabs.developers.google.com/codelabs/cp100-container-engine/#0
Enable scope for User Info and Cloud Platform when creating K8s Engine
cd container-engine
gcloud config set container/cluster bookshelf
docker build -t gcr.io/bookshelf-project-228207/bookshelf
gcloud docker --push gcr.io/bookshelf-project-228207/bookshelf
gcloud container clusters get-credentials bookshelf --zone us-central1-a --project bookshelf-project-228207
kubectl create -f bookshelf-frontend.yaml
kubectl get pods
kubectl get services
Big Data and Machine Learning
Big Data - Extremely large datasets that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to human
behavior and interactions.
GCP - serverless, managed infrastructure. both batch and stream processing. spark and hadoop in the cloud. industry leading ML capabilities
GCP Big Data Services:
GCP BigData Suite - BigQuery, Dataflow, Dataproc, Datalab, Dataprep, Pub/Sub
BigQuery - Enterprise DW, stores and queries massive datasets, SQL syntax for queries, Real-time analytics
Dataflow - Fully managed data processing service, Batch and Stream processing, open source, tight integration with GCP
Dataproc - Fully managed service for running Apache spark and Hadoop clusters, scalable clusters, billed by minute, open source and integrated
Ideal for Migrate hadoop jobs to cloud, Analyze data stored in Cloud storage, Use spark to perform data mining and analysis, Use spark ML libraries to
run classification algorithms
Datalab - Interactive tool for data exploration, analysis, visualization, and ML. open source(Jupyter), supports ML models based on Tensorflow
Pub/Sub - Messaging middleware, Apps publish and subscribe to topics, Ideal for stream processing, Decouples senders and receivers, connecting services
between other GCP services.
Big Data Life cycle:
Ingest Process Store Analyze
------ ------- ----- -------
Google App Engine Cloud Dataproc BigQuery Storage Big Query Analytics
Cloud Pub/Sub Cloud Dataflow
Cloud Monitoring Cloud Dataflow Cloud Storage Apache Hadoop
Cloud Storage Apache Spark
BigQuery Organization
Projects - same as GCP projects, can be shared
Dataset - grouping of tables, lowest level of access control
Tables - row/column structure, actual data
Jobs - queuing large requests
https://www.reddit.com/r/bigquery/comments/3dg9le/analyzing_50_billion_wikipedia_pageviews_in_5/
Google Cloud Machine Learning Platform
machines and apps can learn and adapt new
creating (intelligent)apps that can see, hear and understand the world around them
Cloud ML Engine, Cloud Vision API, Cloud Speech API, Cloud Jobs API, Cloud Translation API, Cloud Natural Language API, Cloud Video Intelligence API
Build on Tensorflow, open source tool to build and run neural network models
ML Engine - managed service to create own machine learning service, ideal for custom predictive analysis. Use cases: Data security, Financial Trading,
Health care, Marketing, Fraud detection, Smart cars.
Cloud Vision API - Image recognition, detect and extract text for OCR
Cloud Natural language API - Reveal the structure and meaning of text, sentiment behind the text
Cloud Translate API = language translation and detection
Cloud Speech API - speech recognition, convert audio to text, over 110 languages supported
Cloud Video Intelligence - video analysis, makes video searchable by content
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment