Skip to content

Instantly share code, notes, and snippets.

@j-mprabhakaran
Created January 20, 2019 05:44
Show Gist options
  • Save j-mprabhakaran/64dd67d9846b7afeae0459845225b68e to your computer and use it in GitHub Desktop.
Save j-mprabhakaran/64dd67d9846b7afeae0459845225b68e to your computer and use it in GitHub Desktop.
Dataflow lifecycle
migration concerns from migrating from on-premises over into google cloud
code snippet to troubleshoot and diagnose
Part 2 - Hands-on with tools
Role of Cloud Architect
plans, designs and builds the infrastructure for an org to host their workload on GCP; able to plan to scale;
scalability and automation
The Importance of Hands-on Practice
Practice
Core Management Services
Cloud Resource Manager(Quotas, IAM, Billing)
Management Services: (IMPORTANT FOR EXAM)
Organization Node and Folders
Org -> Folders -> Projects -> Resources
Org - Highest root node for all GCP resources; Org Admin(Highest level, useful for Auditing), Org Owner(reserved for G suite super admin)
Folder - Group projects under org; share common IAM policies; Roles granted to folder
Quotas
caps on resources you can create; ex: 48 CPU per region, 5 static IP's per project; prevent unexpected spikes in usage;
3 Types - Resources per project, API rate limit requests per project, Per region
Increasing Quota caps - soft caps can be raised by request; support ticket or self service form; quota can be viewed on console; proactively request
Labels
Method of organization and segregation(projects & folders); Labels are tool for organizing GCP resources; any resouce can be labeled(via console, gcloud or API)
64 labels per resource; key:value pair; Ex: Environment - env:prod, env:test; Owner or POC - owner:matt, contact:devops; Team or cost center - team:research,
team:marketing; App component - component:backend, component:frontend; Resource set - state:readyfordeletion, state:inuse
Tags(only for network/VPC resources, affect resource operations)
IAM & Admin -> Select entire Org -> Add -> Select Role -> Resource Manager, Select Folder Admin and Org Admin -> Create Folder
Viewer, Editor, Owner - Primitive Permissions for GCP resources
Using CLI:
gcloud projects get-iam-policy pwnet-test1 --format json > iam.json
ls
nano iam.json
gcloud projects set-iam-policy pwnet-test1 iam.json
Create Custom role for additional permissions
Service accounts
Project Creator, Billing Account Creator access required for creating new projects
gcloud config list
gsutil ls gs://pwnet-bucket1
gsutil cp gs://pwnet-bucket1/*
gsutil cp file.txt gs://pwnet-bucket1 # access denied if the API permission is only READ, edit instance provide READ WRITE access
IAM Best Practices
Use Principle of least privilege
Restrict service account access
Restrit Service account admin role
Careful with Owner role (Owner can change IAM policy)
Rotate service account keys periodically
Auditing - Cloud Audit Logs, Export Logs to cloud storage, restrict log access
Billing:
Bigger you scale, greater in number of resources
Billing roles defined in IAM
Org is top in Hierarchy, Billing accounts linked to projects(Required Billing Account User)
view billing info - 1.web console 2.export to cloud storage and big query 3.set budgets and alerts
find all charges that were more than 3 dollars:
SELECT product, resource_type, start_time, end_time,
cost, project_id, project_name, project_labels_key, currency, currency_conversion_rate,
usage_amount, usage_unit
FROM `cloud-training-prod-bucket.arch_infra.billing_data`
WHERE (cost > 3)
find which product had the highest total number of records:
SELECT product, COUNT(*)
FROM `cloud-training-prod-bucket.arch_infra.billing_data`
GROUP BY product
LIMIT 200
which product most frequently cost more than a dollar:
SELECT product, cost, COUNT(*)
FROM `cloud-training-prod-bucket.arch_infra.billing_data`
WHERE (cost > 1)
GROUP BY cost, product
LIMIT 200
Stackdriver
suite of tools for monitoring, logging, and tracking diagnostics for apps; native monitoring of both GCP and AWS; Dynamically discover all GCP resources
1.Monitoring - monitor metrics, health checks, dashboard and alerts etc
2.Logging - audit of activity
3.Error Reporting - identify and understand app errors
4.Trace - app engine find bottlenecks
5.Debugger - find/fix code errors in prod
Benefits - Multicloud monitoring, Identify trends and prevent problems before they occur, Centralized logging, Better signal-noise ratio, Find & fix problems faster
3rd party integrations (SRE vendors) - BMC, Splunk, PagerDuty, Tenable, HipChat, netskope
Best practice - single project for stackdriver monitoring, determine monitoring needs in advance, IAM controls
Concepts:
Pricing - Basic and Premium(Seperate from GCP account status); Applies only to monitoring; new accounts 30 day free trial
$8 per month per resource; 30 days log retention, 500 time series per chargeable resource, 250 metric types per project
Stackdriver agent - software installed on VMs; recommended not required, agentless gets CPU, disk/network traffic, and uptime info; agent access
addition resource and application info; requires premium tier, monitor many 3rd party apps(Apache, Kafka, MySQL, Nginx, Tomcat etc)
$ curl -sSO https://dl.google.com/cloudagents/install-monitoring-agent.sh
$ sudo bash install-monitoring-agent.sh
Resources -> Metric Explorer, Cloud Storage etc
Groups -> Create Groups (select a project with group of instances)
Dashboard -> Create Dashboard
Explore resource, Alerting, uptime checks
Stackdriver logging:
Concepts - repository for log data and events; store, search, analyze, monitor and alert; collect platform, system and app logs(agent); realtime/batch
Associated by project; Log entry - record status or event; Log - named collection of log entries; retention period
Audit Log Types - 1.Admin Activity(automatically turned on, requires IAM role logging/Logs viewer or Project viewer, always enabled no charge),
2.Data Access(create modify or read user-provided data, requires IAM role logging/Private Logs viewer or Project Owner, Disabled charged on usage)
Retention - Admin activity 400 days; Data access logs 7/30 days, Non audit logs 7/30 days
Allotment - 50Gb per project / 50+14.25MB premium, overage charge $0.50 per GB per project
Exporting Logging date - 1.Cloud storage 2.BigQuery 3.stream to other source(pub/sub); requires project/destination bucket; create a filter;
choose destination; filter and destination held in a sink
Best practices - search for specific values, use adv filters, use adv viewing interface
HandsOn:
view logs
filter(basic/advanced views)
turn on real time viewing
export logs to cloud storage/big query
enable data access logs
gcloud projects get-iam-policy pwnet-test2 --format json > policy.yaml
ls
nano policy.yaml # add auditConfigs:
gcloud projects set-iam-policy pwnet-test2 policy.yaml
Trace, Error Reporting, and Debugger Concepts
Error reporting - real time error monitoring; automatic and real time analysis; automatically enabled in App Engine;
Trace - find performance bottlenecks(latency); collect data from GAE, LB, or apps with Stackdriver Trace SDK;automatically enabled in App Engine
Debugger - Inspect app state without stopping or slowing app; doesnt req additional log statement; automatically enabled in App Engine standard
GCP Core Building Blocks
Google Cloud Storage - Unstructured data, virtually limitless size, Pay per use not allocation, primary unit is bucket, object inside bucket
Storage Class - Regional, Multi-regional, Nearline, Coldline
Changing storage class - cannot change from multi-regional to regional vice versa; gsutil to change class of existing object or move obj to another bucket
gsutil(FOR CLOUD STORAGE)
https://cloud.google.com/storage/docs/gsutil
gsutil mb -l us-central1 -c nearline gs://pwnet-test1-test
gsutil ls -l gs://pwnet-test1-test
Cloud Storage Security
Access Management principles - IAM and ACL
IAM - granted at projects, resource or bucket level; Roles - Primitive, Standard Storage roles (independently from ACLs), Legacy roles (work with ACLs)
ACLs - can be applied to buckets/objects; Objects inherit ACS from default bucket ACL
Best Practice - use IAM over ACL(enterprise grade access control, leaves audit trail); use ACL to grant access to obj without access to bucket
signed URLs - times access to object data (temporary access without google account)
storage.cloud.google.com/bucketname
Assign IAM role to bucket
via console
gsutil iam ch user"bob@professionalwireless.net:objectCreator,objectViewer gs://pwnet-test1-test
gsutil iam -d user"bob@professionalwireless.net:objectCreator,objectViewer gs://pwnet-test1-test
Assign ACL role to bucket and objects
via console
gsutil acl ch -u bob@professionalwireless.net:O gs://pwnet-test1-test
gsutil acl ch -d bob@professionalwireless.net gs://pwnet-test1-test
gsutil acl ch -u bob@professionalwireless.net:O gs://pwnet-test1-test/3.png # only access to object
gsutil -m acl ch -u bob@professionalwireless.net:O gs://pwnet-test1-test/*
Mixed owner/read permissions
Storage Legacy Bucket Owner - create, upload, delete file but cannot view the contents
Storage Object Creator, Storage Object viewer - create, upload and view but cannot delete
signed URLs
APIs & Services -> Create Credentials -> Service account key -> New service account -> select name and role -> create (JSON file downloaded)
ssh session -> upload file -> mv pwnet-test1-xedefdefefe.json pwnet-cert.json
gsutil signurl -d 10m pwnet-cert.json gs://pwnet-test1-test/3.png
get the URL from output and give it to user who need access to the object
Object Versioning and Lifecycle Management Concepts
Object versioning - retrieve objects that are deleted or overwritten; applied at bucket level; disabled by default; when enabled objects archived
version increase bucket size, archive version retains ACLs; Versioing properties - Generation (obj content change), Metageneration
Object Lifecycle management
Sets TTL on an object(to delete version/downgrade storage class); Applied to bucket level ; implemented with combination of rules, conditions, actions
Rule - Specify set of conditions in order to take action
Condition - criteria to meet before action; Age, CreateBefore, IsLive, MatchesStorgaeClass, NumberOfNewerVersions
Actions - Delete, SetStorageClass
gsutil versioning help
gsutil versioning get gs://pwnet-test1-test
gsutil versioning set on gs://pwnet-test1-test
gsutil ls -a gs://pwnet-test1-test
gsutil lifecycle get gs://pwnet-test1-test > policy.json
edit the file to change the rule
gsutil lifecycle set policy.json gs://pwnet-test1-test
Bucket and Object Command Line A-Z
gsutil ls -al gs://pwnet-test1-test #gets metageneration
gsutil -m rewrite -s NEARLINE gs://pwnet-test1-test/* # set off versioning before, to move diff storage class
gsutil acl ch -u AllUsers:R gs://pwnet-test1-test/file.txt # shows as public link on console
Interconnecting Networks
Worldwide private network; communication between regions and on-premises never touches public internet; networking handled differently than others.
SDN - traditional network(manage network hardware, high mgmt overhead req) SDN(Everything is virtualized)
single global/cross region VPC; global internal DNS/load balancing/firewalls/routes; global public DNS; Rapid scaling with global LB(Layer 7/HTTP);
Subnets within VPC group resources by region/zone; IP range between subnets dynamically expandable.
Extend Google Private Network to On-premises - VPN, Cloud Interconnect, Direct Peering
Connecting your Network to Google
1. Dedicated Interconnect - Physically connect on-premise network to GCP VPC via Google Edge location; Useful for Hybrid env, High bandwidth traffic;
Must be at supported peering location; can be direct with Google or ISP; $1700 per 10Gbps link, upto 80 Gbps total; Reduced egress fees
Use Cases - On-premise data processing, low latency needs,
2. Peering - connect business directly to google; 70+ location in 33 countries for Direct peering; Exchange BGP routes; Direct and Carrier Peering;
Does not connect to internet; Also save on egress fees; 10GBps per link(direct), variable for carrier; Use case Ex: Private API excess
3. Cloud VPN - Site to site VPN connection over IPSec; connect internal network to GCP over encrypted tunnel over public internet; Up to 1.5 Gbps per tunnel;
Can use multiple tunnels for increased performance; Static and dynamic routes(using Cloud Router); Supports IKEv1 and IKEv2 using shared secret;
connect on-premises to GCP or connect twoo different VPC's on GCP; No site to client option available.
CloudVPN
connect on-premise network to GCP VPC; IPSec connection over VPN over public internet; traffic encrypted by one gateway, decrypted by other gateway.
99.9% SLA, Site-to-site only; Upto 1.5Gbps per tunnel, can have multiple tunnel; Static and Dynamic routes
Use case - Connect to on-premises or connect 2 different VPC network on GCP
Requirement - VPN Gateway on both ends(peer), Peer Gateway must have static IP; Non conflicting CIDR range/subnet with rest of network
Cloud Router - Static vs Dynamic routing; Static:create routing table for existing and new routes, Can't re-route if link fails; Dynamic:networks
automatically discovery topology changes via BGP; Can re-route if link fails
To use Dynamic routing, change dynamic routing mode to Global on VPC network.
Google ASN(65000-65001) and BGP address(169.254.0.1-169.254.0.2) required
Tunnel IP is static IP of other VPN Gateway
Add BGP session for Dynamic Routing
gsutil cp gs://gcp-course-exercise-scripts/vpn-exercise-script.sh .
bash vpn-exercise-script.sh
Virtual Networks
VPC Concepts
subnets are region bound and can span span multiple zones
isolated per project; but can share between projects with Shared VPC
Quotas - Hardcap of 7000 VMs in a VPC; IPv4 unicast traffic only; Most other quotas can be increased by request
Network Tags - primary method of segmenting network traffic access; apply to firewall and network routes; individual instances are tagged
Firewall - single firewall for entire VPC; manage both ingress and egress traffic; Deny all Ingress, Allow all egress; Conditions - source/target, port, protocols, Tags
create firewall rules - ssh-icmp-instance2(Tag: restrict-access); internal-allow-all; ssh-allow; ping-allow
Firewall Rules via Command Line - vnc-desktop
firewall rule for port 5901
vnc-allow; Target tag vnc-server; tcp:5901 # get the command line and paste in CLI
gcloud compute firewall-rules create vnc-allow ......
gcloud compute instances add-tags vnc-desktop --tags vnc-server
Routes - software based, not limited by hardware; routes traffic leaving VMs; special case for advanced routing Many-to-one route, Proxy server;
Routes+firewall rules combine to determine traffic access
Shared VPC Concepts
share VPC across projects within Org(Cross Project Networking)
Host project - project hosting the shared VPC; Service project - project with permission to shared VPC; Standalone project - project not using shared VPC;
Shared VPC admin - IAM role for admin of shared VPC; Service project admin - project admin of shared VPC service project
Use cases - Seperation of projects for access control/billing, but need access to same VPC environment; 2 tier web service; Hybrid cloud scenario
IAM roles - Org Admin, Shared VPC admin(Org level role), Network user-compute.networkUser(Project level role)
Compute Shared VPC admin role required to the user to enable Shared VPC;
gsutil cp gs://gcp-course-exercise-scripts/firewall-exercise-script.sh .
Compute Engine Deep Dive
GCE, GKE, GAE all run on VMs; Single VM, Force multipliers, Automation, Autoscaling, Managed Instance Groups, Load Balancer, Custom Image, Disk manipulation,
Metadata, Startup/Shutdown scripts,Snapshots, Persistent Disks, gcloud commands
Disk concepts - Single root disk for OS; Persistent(most common, default, Not Directly attached) or Local SSD (Directly attached) or Cloud Storage Buckets
Persistent - 64 TB in total, Scope of access zone, no RAID config necessary
Local SSD - cannot be boot device, encrypted(Google Supplied Keys only), 375GB in size (can attach upto 8), must create on instance creation
Cloud Storage Bucket - Not a root disk, Encrypted, Lower performance
Disks are zone bound
gcloud compute disks create disk03 --size 50GB --type pd-standard
sudo lsblk
sudo growpart /dev/sda 1
sudo resize2fs /dev/sda1
Images Concepts
Images - create new instances, configure instance templates; access across projects
Snapshots - periodic incremental backup of existing disk/instance, access only from within same project
Images created from Persistent disk, another image in same project, image shared from another project, compressed image from cloud storage
Image families(group related images together); Deprecating images - transition user away from older unsupported version in manageable way, Deprecation
states: Deprecated, Obsolete, Deleted, Active(command only)
Sharing and moving images - Require Compute Engine Image User role to host project; For managed instance group, service account must be granted role
Export image to cloud storage - export image as a tar.gz to cloud storage(only linux); share with Image User role is preferable
Hands On - Custom Images
gcloud compute images describe-from-family webserver
gcloud compute images deprecate webserver-base --state ACTIVE
Snapshot Concepts
For windows use VSS snapshots; run fstrim before taking snapshot(linux)
gcloud compute disks snapshot website --zone us-central1-a --snapshot-names=website-backup-2
gcloud compute snapshots list
gcloud compute snapshots describe website-backup
Startup and Shutdown scripts
ease managemet of large no of VM's; easily and programmatically customize VM; key component to instance group and scaling capabilities
always run as root/administrator; Input methods - Direct (script field in instance properties), Link to script on Cloud Storage
Shutdown scripts - great for managed instance group/autoscaler; Ex: copy processed data to cloud storage, backup logs etc; Good to pair with preemptible
Metadata server - Built into GCP; Manage config and env variables programmatically; Default and custom values; Key/value pair
Metadata -> startup-script-url gs://pwnet-bucket1/startup_script.sh
Elastic Cloud Infrastructure: Scaling and Automation
Load Balancing and Instance Groups
Force Multipliers - Automation and Scaling - Scalable, Automatic
Repeatabale, documented, scalable, necessary for large architecture, reduce complexity
Load Balancer, Instance Group, Autoscaling
Load Balancer - distributes user network requests among a pool of instances; single frontend point of access; SDN; Global or regional in scope;
traffic subject to firewalls; Types - Global External LB(HTTP/s, SSL/TCP Proxy), Regional External LB(Network TCP/UDP), Regional Internal LB
HTTP(S) LB - Manages HTTP(s) requests; Global scope, IPv4 and IPv6, Distribute traffic by location or content requested; Paired with IG for backend;
Native support for websocket protocol
Network LB - non HTTP(s); balance requests by IP protocol data; Forwarding rules, Target pool
Network Internal LB - private LB; used with multi-tier app; affects cloud router dynamic routing
Instance Group and Autoscaling
IG - group of instances; Manages as a group not one at a time; Managed IG and Unmanaged IG;
Features - Autoscale, work with LB, Health check-ASG; Require Instance templates(Define Group config, Global); From template create Managed IG
Networking - subject to firewall rules for allowed traffic; essential for LB; LB->Backend Service->Backend->IG ;
Health checks - Auto healing; Managed IG only; if instance or service fails, delete and recreate identical
Updating Managed IG - Managed Instance Group updater;
Autoscaling - automatically scales IG; Managed IG only; Set by autoscaling policy; Set metric and threshold; set min and max instance count
AS based on CPU usage, HTTP load balancing usage, Stackdriver monitoring metric, Multiple metrics
simulate Load Testing via instance
ab -n 500000 -c 1000 http://xx.xxx.xxx.xxx/
Cloud Deployment Manager Concepts
Infra deployment service; Automates creation/deployment of GCP resources(configuration files and templates); Standardize and repeatable;
Used by Cloud Launcher to create, easy one click deployments
How it works - Deploy with command line only; IaC; calls on API resources; Configuration file YAML format; contain resource section followed by list
of resources; Resource components - Name, Type, Properties; Templates - config file contains templates; Python or JINJA2 format; reusable
Manifest - Read only output of final config; Includes config Yaml, imported templates, expanded resource list; use for troubleshooting
vm.yaml
API call needs project name
gcloud deployment-manager deployments create test-deployment --config vm.yaml
gcloud deployment-manager deployments delete test-deployment
GKE/GAE Exam Perspective
Infrastructure, how to build, how to manage, best practices
GKE/GAE are managed Infra, developer/code focused; High level understanding of GKE/GAE; when to choose one of them over other options;
Managing app engine versions, resizing k8s engine cluster
Containers
Container Resources
Container Builder, Container Registry, GKE
Container/Kubernetes Engine Cluster
gcloud container clusters create bookshelf --zone uns-central1-a --machine-type f1-micro --num-nodes 3
gcloud command to input for changing the size of node pool
gcloud container clusters resize bookshelf --size 5
gcloud command to change machine type without stopping cluster
migrate the instances to different node
gcloud container clusters delete bookshelf
App Engine Resources and Management
Cloud Source Repository - Private Git repo hosted on GCP; Use with stackdriver to debug info alongside your code; connect to github/bitbucket;
source code browser
GAE Management - Cloud shell(preview in local env without deploying); versions+split traffic(Rollout update slowly);
Firewall rules act differently(Default allow all, control access from IP ranges, cannot filter traffic type, Block malicious IP);
Best Practices - Break app into microservices; Rollout update slowly with split traffic; Use blue-green deployment model
Go to App Engine directory, create sandbox env using "dev_appserver.py ./app.yaml"
Build and Deploy a Scalable Company Website
Deploy a Cloud Network Monitoring Service to Monitor On-Premises Network
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment