szinck1/saa-exam-tips.md

## saa-exam-tips.md

      
    Raw
  

              saa-exam-tips.md
            
          
    Amazon Solutions Architect Associate
Exam Notes
S3


Stores objects (files) in "buckets" (directories)


Buckets must have globally unique name


Buckets are created at the region level


S3 exposes HTTP and HTTPS endpoints


Naming conveion

No uppcase
No underscore
3-63 characters
Not an IP
Must start with lowercase letter or number


Objects

Files/objects have a key
Key is the FULL path s3://my-bucket/afolder/my_file.txt
The key is composed of prefix + object name
s3://my-bucket/prefix/object_name
Each object has metadata (key/value pairs- system or usermetadata)
Each object can be tagged with a key/value pair - up to 10
Each object has a Version ID if versioning is turned on


Object Values

Max object size is 5TB
Upload > 5Gb, must use multi-part upload


There is no concept of a directory within buckets; just very long key names that contain slashes


S3 Versioning

Stores all versions of an object (including all writes and even if you delete an object)
Once enabled, Versioning cannot be disabled, only suspended
Easy to rollback to previous versions
Any file not versioned prior to enabling version will have a version of "null"
Integrates with Lifecycle rules
Versionsing’s MFA Delete capability, which uses MFA, can be used to provide an additional layer of security -- important for exam.

S3 Encryption

Four methods of encryption objects:

SS3-S3: Uses key managed by AWS

Object is encrypted server side
AES-256 encryption type
Must set header: "x-amz-server-side-encryption": "AES256"


SSE-KMS: Uses AWS KMS to manage encryption keys

Advantages include user control and audit trail
User maintains control of rotation policy for keys
Object is encrypted server side
Must set header: "x-amz-server-side-encryption": "aws:kms"


SSE-C: Manage your own keys

Server Side Encryption using keys managed by customer outside AWS
S3 does not store the custom provided key
HTTPS is required because key is transferred per request
Key must be provided in HTTP headers for every request
Not available via the Console, must be used via cli or programmatically


Client Side Encryption

Object is encrypted prior to transfer
Client library such as Amazon S3 Encryption Client can be used
Client must decrypt when receiving from S3
Customer manages key + encryption


Default encryption options can be set or objects can be encrypted per upload

S3 Security

User based

IAM Policies - defines which API should be allowed per user from IAM Console


Resource based

Bucket Policies - bucket-wide rules from the S3 console, allows cross-account access to S3 buckets
Object Access Control List (ACL) - fine grained permissions
Bucket Access Control List - less common method


IAM principal can access an S3 object if:

IAM permissions allow it OR the resource policy allows it
AND there is no explicit DENY


S3 Websites

Can host static websites; url will be bucketname.s3-region.amazonaws.com
Not enabled by default

CORS Overview

Origin is a scheme (protocol), host (domain) and port
i.e. https://example.com
Cross-Original Resource Sharing- getting resources from another Origin
Same origin: example.com/app1 and example.com/app2
Different origin: example.com & other.example.com
Requests won't be fulfilled unless the other origin allows for the request, using CORS Headers
(i.e. Access-Control-Allow-Origin)

S3 CORS

If client does a cross-origin request on our S3 bucket, we need to enable the correct CORS headers
Can allow for specific origin or for * (all regions)

S3 Consistency Model

Read after write consistency for PUTS of new objects

As soon as the object is written (PUT 200), a GET can get it


Eventual Consistency for DELETES and PUTS of existing objects

If reading an object after updating it, we might get an older version. i.e.
PUT 200 (write1) -> PUT 200 (write2) -> GET 200 (might be write1)
Might still be able to retrieve a deleted item for a short time (DELETE 200 -> GET 200)


Not possible to request 'strong consistency for S3'

S3 MFA DELETE

Can only be enabled by root account
Can only be enabled via CLI
Versioning must be enabled on the bucket
MFA is required for:

permanently delete an object version
suspend versioning on the bucket


Not needed for

to enable versioning
listing deleted versions


S3 Access Logs

If enabled, AWS will automatically add the "S3 log delivery group" to the bucket ACL with Object Write and Bucket ACL Read
Logs get/put/read etc access to S3 buckets

S3 Storage Classes

Standard - General Purpose

High durability (11 9s) of objects across multiple AZ
10 mil objects, expect to incur loss of single object every 10k years
Sustain 2 concurrent facility failures
Use Cases

Big Data analytics, gaming apps, content distribution...


Standard-Infrequent Access (IA)

Suitable for data that is less frequently access but rapid access when needed
Low cost compared to S3 Standard
Use Cases

Backups, DR...


One-Zone IA

Same as IA but stored in single AZ
Use Cases

Data that can be recreated, secondary backups...


Intelligent Tiering

Same performance as S3 Standard
Additional small fee
Moves objects between 2 access tiers based on changing ptterns


Amazon Glacier

Low cost object storage
Mean for archives/backups where data needs to be retained for years
Min storage duration of 90 days
Alternative to on-prem tape storage
Each item in Glacier is called an Archive (up to 40Tb)
Archives are stored in Vaults
Three retrieval options

Expedited - 1-5 min
Standard - 3-5 hours
Bulk - 5-12 hours


Amazon Glacier Deep Archive

Cheaper than Glacier
Min storage duration of 180 days
Retrieval options

Standard - 12 hours
Bulk - 48 hours


S3 Moving between storage classes

Can transition objects bween storage classes
This can be done automatically using Lifecycle Rules

S3 Lifecycle Rules

Transition Actions define when objects are transition to another storage class

Move object to Standard IA 60 days after cretion
Move to Glacier after 6 months


Expiration Actions - configure objects to expire/delete after given time. Examples...

Access logs set to be delete after 1 year
Can be used to delete old versions of files
Delete incomplete multi-part uploads


Rules can be created for a certain prefix (s3://mybucket/mp3/*)
Rules can be created for conertain object tags

Lifecycle Management, S3 IA and Glacier

What is Lifecycle Management? Automates moving your objects between different storage tiers
Can be used in conjunction with versioning
Can be applied to current versions and previous versions

S3 Object Lock and Glacier Vault Lock

S3 Object Lock

Write Once Read Many (WORM) mode
Block an object version delete for specific amount of time - no deletion or modification
Object locks can be on individual objects or applied across the bucket as a whole
Object locks come in two modes: governance mode and compliance mode.  In Government Mode, users can’t overwrite or delete an object version or alter its lock settings unless they have special permission.  In Compliance Mode, a protected object version can’t be overwritten by any user, even the root user.


Glacier Vault Lock

WORM Model
Lock the policy for future edits (can no longer be changed)
Helpful for compliance and data retention
S3 Glacier Vault Lock allows you to easily deploy and enforce compliance controls for individual Glacier vaults with a Vault Lock policy.  You can specify controls such as WORM in a Vault Lock policy and lock the policy from future edits. Once locked, the policy can no longer be changed. \


S3 Prefix
Example:

mybucket/folder1/subfolder1/myfile.jpg -> /folder1/subfolder1
mybucket/folder1/subfolder1/myfile.jpg -> /folder2/subfolder1
mybucket/folder3/myfile.jpg -> /folder3

S3 Performance Optimization

Multipart Uploads

Recommended for files >100MB
Required for files >5Gb
Can help parallelize uploads


S3 Transfer Acceleration

Utilizes CloudFront edge network: instead of uploading directly to S3, the file is sent to an edge location, and then transferred to S3 across Amazon’s backbone network
Distinct url i.e. szbucket.s3-accelerate.amazonaws.com
Compatible with multi-part upload


S3 Byte-Range Fetches

Parallelize GETs by request specific byte ranges
Better resilience in case of failure


Dictated by the prefix.

The first byte out of S3 is within 100-200 milliseconds.
3500 PUT/COPY/POST/DELETE and 5500 HEAD/GET requests per second can be achieved per prefix.


S3 Performance

Dictated by the prefix.
The first byte out of S3 is within 100-200 milliseconds.
3500 PUT/COPY/POST/DELETE and 5500 HEAD/GET requests per second can be achieved per prefix.

S3 - KMS Limitation

If using SSE-KMS, may be impacted by KMS limits

Uploading a file calls GenerateDayKey KMS API
Downloading calls Decrypt KMS API
Count towards KMS quota per second (differs per region, cannot be changed)


S3 and Glacier Select

Allows you to use simple SQL expressions to return only the data you’re interested in instead of retrieving the entire object.  This can greatly improve performance.
Get data by rows or columns
Amazon performs server-side filter on the request and only sends back the requested data
Less network transfer, less CPU cost client-side

S3 Event Notifications

A mechanism to send notification/trigger event when an event happens in S3
Object name filtering is possible (*.jpg)
Use Case

Generate thumbs of images uploaded to S3


Event Notification Targets

SNS
SQS
Lambda Function


No limit to number of S3 events
Notifications typically delivered in seconds
To ensure that an event notification is triggered for every successful write, versioning must be enabled on bucket

Athena

Service to perform analytics directly against S3 objects
Uses SQL language
Has a JDBC/ODBC driver
Charged per query and amount of data scanned
Supports CSV, JSON, Parquet, etc...
Built on Presto

AWS Organizations

Allows multiple AWS accounts and centrally manage them
The main account is called the parent, other accounts are member accounts
Member accounts can only be part of one Organization
Enable MFA on root account
Use strong password
Paying account should be used for billing purposes only; do not deploy resources in it
Enable/disable services using Service Control Policies (SCP) on either OU or individual accounts
Organizational Users can be used to organize accounts

Parent account -> Engineering OU
-> Project1
-> Project2


Service Control Policies (SCP)

Allows you to whitelist or blacklist IAM actions
Applied at the OU or Account level
Does not apply to Parent/Master account
SCP is applied to all Users and Roles of the Account, including root
SCP does not impact service-linked roles
Accounts inherit OU level policies and cannot be circumvented by conflicting rules
User cases

Deny access to specific services
Enforce audit compliance (ie PCI) by disabling services


Migrating accounts between Organizations

Remove member account from old organization
Send invite from new organization
New account accepts invitation


Migrating the Master/Parent account to a new Organization

Remove all member accounts
Delete the old organization
Repeat process to migrate a specific account


Consolidated Billing

Paying account is independent; cannot access resources in other accounts.
All linked accounts are independent
One bill per AWS account

Sharing S3 Buckets Across Accounts (3 methods)

Bucket Policies & IAM - applies across entire policy with programmatic access only
Using Bucket ACLs & IAM - applies to individual objects, programmatic access only
Cross-account IAM roles. Programmatic AND Console access.

S3 Replication

Cross-region replication (CRR) if the source/dst are in different regions

Use case: compliance, lower latency access, replication across accounts


Same Region Replication (SRR) if the src/dst are in the same region

Use case: log aggregation, live replication between prod and test accounts


Object permissions are not replicated, only the object is replicated
Buckets can be in different accounts
Copying happens async
Must give proper IAM permissions to S3
Versioning must be enabled on both src/dst for replication to work
Files in existing bucket are not replicated automatically, only subsequent files
Deleting with version ID adds a delete marker, not replicated
Deleting with version ID, deletes in the source, not replicated
Optionally, Delete markers created by S3 delete operations can be replicated.
Delete markers created by lifecycle rules are not replicated.
No deletes are replicated

AWS DataSync

A service for transferring large amounts of data typically from on-prem to AWS. An agent is deployed on your server and data is copied to AWS S3, EFS or FSx for Windows File Server
Used with SMB- and NFS- compatible file systems
Replication can be done hourly, daily, weekly
Can be used to replicate EFS to EFS

CloudFront


Used to deliver websites, including dynamic, static, streaming and interactive content from edge locations.  Requests first go to edge location and if not available at edge, the edge will pick it up from the origin and serve it to the client.


Edge Location - location where content is cached. Separate from Region/AZ


Origin: Origin of files that CDN will distribute.

S3 bucket

For distributing files and caching them at the edge
Enhanced security with CF Origin Access Identity (OAI)
CF can be used as an ingress to upload files to S3


Custom Origin (HTTP)

ALB, EC2 Instance, S3 website, any HTTP backend you want


Distribution: Name given to CDN which consists of a collection of Edge Locations

Can restrict access to a distributin
Can create whitelist/blacklist per country using a 3rd party geo-ip db
Excellent for caching static content across many regions


Edge locations are READ/WRITE


Cached objects have a TTL


Clearing cached objects is possible, but will be charged (called invalidating cache)


Signed URLs/cookies are possible


Invalidate objects can be a troubleshooting technique when new content is not retrieved


Provides DDoS protection and integration with AWS Shield and AWS WAF


CloudFront Signed URls and Cookies

A signed URL is for individual files. 1 file = 1 URL
A signed cookie is for multiple files. 1 cookie = multiple files
Attach a policy with

URL Expiration
IP ranges permitted to access data
Trusted signers (which AWS accounts can create signed URLs)


Use Case

Distribute content securely from S3, only CF will be able to acess the bucket


CF Signed URL vs S3 Pre-Signed URL

CF Signed URL

Allow access to a path, no matter the origin
Account wide key-pair, only root can manage it
Can filter by IP, path, date, expiration
Can leverage caching features


S3 Pre-Signed URL

Issue a request as person who pre-signed the URL
Uses the IAM key of the signing IAM principal
Limited lifetime


Global Accellerator


Leverage the AWS internal network to route to your application


2 Anycast IP are reated for your application


The Anycast IP send traffic directly to Edge Locations


Edge locations send traffic to youyr application


Unicast IP vs Anycast IP**

Unicast IP

One server holds one IP address


Anycase IP

All users hold the same IP address and the client is routed to the nearest one


Works with EIP, EC2, ALB, NLB, public or private


Consistent performance

Intelligent routing to lowest latency and fast regional failover
No issue with caching because IP doesn't change


GA Health Checks to ensure your application remains global (failover less than 1 minute for unhealthy)


Security

Only 2 external IP to be whistelisted
DDoS protection due to AWS Shield


Creating a signed URL or signed cookie:

Attach a policy
Policy can include URL expiration and IP ranges
Trusted signers (which AWS accounts can create signed URLs)

How do signed URls work?

Users cannot access origin - must route through edge/cloudfront Edge/CF uses OAI (Origin Access Indentity) to access origin
Client authenticates to application.  Application generates signed URL or Cookie.  The signed URL/Cookie is returned to the client. The client can now use that URL or Cookie to access CloudFront/Edge. CF/Edge retrieves data from Origin (S3/EC2 etc)

CF Signed URLs Features

Can have different origins, doesn’t need to be EC2
Key-pair is account wide and managed by root user
Can utilize caching features
Can filter by date, path, IP address, expiration etc.

S3 Pre-signed URL

Can generate pre-signed URLs using SDK or CLI

For downloads can use CLI
Uploads only via SDK


Valid for a default of 3600 seconds, can change timeout with --expires-in SECONDS
Issues a request as the IAM User who creates the presigned URL
Users given a pre-signed URL inherit the permissions of the person with generate the URL for GET/PUT
Limited lifetime
Useful when users are accessing S3 directly. If origin is EC2, use CF signed urls.
Use Cases

Allow only logged-in users access to specific content
Generate URLs dynamically for new users
Temporarily allow a user to upload to a specific location in a bucket


Snowball

Snowballs can import and export from S3 only
Big tamper-resistant physical appliance used for physically transferring data to avoid high network cost.
Can be 1/5th cost of network transfer
50Tb to 80Tb
Uses 256-bit encryption
Uses industry standard TPM to ensure full chain-of-custudy
Tracking using SNS. E-ink shipping label

Snowball Edge

100Tb of data transfer with on-board storage and compute

Storage optimized - 24 vCPU OR,
Compute optimized - 52 vCPU & optional GPU


Supports custom AMIs
Can run Lamba on Snowball Edge
Supports local workloads in remote/offline locations
Can cluster together

Snowmobile

Exabyte-scale data transfer service used to move extremely large amounts of data to AWS
100PB per Snowmobile
45-foot ruggedized shipping container pulled by a semi-trailer truck

Storage Gateway

Service to connect on-prem software appliance with AWS storage infrastructure (EBS, S3 or Glacier)
Physical or VM
Once configured, can be managed by AWS Management Console
Three types of Storage Gateway:

File Gateway (Software appliance)

Supports VMware, Hyper-V and EC2 instances
NFS and SMB protocol
Supports S3 standard, IA, One Zone IA
Bucket access using IAM roles for each File Gateway
Most recently used data is cached in the file gateway
Because it's NFS or SMB, it can be mounted on many servers


File Gateway - Hardware Appliance

Does not require a virtualization layer like the File Gateway software appliance
Use case:

Helpful for daily NFS backups in small data centres


Volume Gateway

Block storage using iSCSI acked by S3
Backed by EBS snapshots can help restore on-prem volumes
Cached volumes: low latency access to most recent data
Stored volumes: entire dataset is on prem, scheduled backups to S3


Tape Gateway (VTL)

Backed by S3 and Glacier
Back up data using existing tape-based processes
Works with leading backup software vendors


File Gateway:

Customer App Server -NFS-> Storage Gateway -> Internet/DirectConect/VPC -> Amazon S3/S3-IA/Glacier
Flat files, stored directly on S3

Volume Gateway:

Presents your application with disk volumes using iSCSI
Data written to these volumes can be async backed up to point-in-time snapshots and stored in Amazon EBS
Snapshots are incremental backups that capture only changed data. Snapshot storage is compressed.

Volume Gateway - Stored Volumes

Entire dataset is stored on site and is async backed up to S3
Async backed up to AWS to S3 in form of EBS snapshots

Volume Gateway - Cached Volumes

Entire dataset is stored on S3 and most frequent accessed is cached on site
Retains frequently accessed data in local gateway storage
Cached volumes minimize need for on-prem storage \

Tape Gateway

Use existing tape infrastructure
Create virtual tapes backed by Amazon S3

Athena

Interactive query service for data located in S3 using standard SQL
Serverless, nothing to provision
Works directly with data stored in S3
Can be used to query log files stored in S3, ELB logs, access logs etc.
Analyse AWS cost and usage reports
Supports JSON, Apache Parquet and Apache ORC data formats

Macie

Security service which uses ML and NLP to discover, classify and protect sensitive data stored in S3
Uses AI to recognize S3 objects which contain sensitive data
Provides dashboards, reporting and alerts
Works directly with data stored in S3
Can also analyze CloudTrail logs

IAM

Users, Groups, Roles, and Policies
It is global; does not apply to regions
“Root account” is the account created when first setup. It has Admin access
New Users have NO permissions when first created

EC2

Root volumes can be encrypted Third party tools (BitLocker etc) can also be used to encrypt the root volume.
In order to enable encryption at rest using EC2 and Elastic Block Store, you must configure encryption when creating the EBS volume
Additional volumes can also be encrypted
Termination Protected is turned off by default
EBS-backed instance, the default action is for the root EBS volume to be **deleted **when the instance is terminated
Additional volumes will not be deleted by default
Instances are deployed in Availability Zones
The underlying hypervisor is Nitor. Prior to 2017, it was Xen.
Standard Reservered Instances can’t be moved between regions
By default, EC2 instances come with a private IP for the internal AWS network and a dynamic public IP

AMI

Possible to share an AMI with another AWS account
Sharing an AMI does not affect the ownership of the AMI
If you copy an AMI that has been shared with your account, you are owner of the target AMI in your account
To copy an AMI that was shared with you from another account, the owner of the source AMI must grant read permissions
for the storage that backs the AMI (for EBS-backed AMI), either the associated EBS snapshot or the associate S3 bucket (for instance store-backed AMI)
Can't copy an encrypted AMI that was shared with you from another account.
If the underlying snapshot and encryption key were shared, you can copy the snapshot while re-encrypting it with
your own key.  The copied snapshot is owned by you and can be registered as a new AMI
Can't copyan AMI with an associated billingProduct code that was shared with you from another account.  Includes
Windows AMIs and AMIs from the Marketplace. To copy a shared AMI with a billingProduct code, launch EC2 instance using
the shared AMI and then create an AMI from the instance
Sharing

AMIs create in a region cannot be shared outside that region
Click Modify Image Permissions. Add account number to grant access
Must tick "create volume" to give permission to allow allow "Copy AMI" from the Console


Security Groups

All ingress traffic is blocked by default
Allow outbound traffic is allowed
Changes to security groups take effect immediately
Security Groups are stateful
Network-Access Control Lists are stateless
Can’t block individual IP addresses or individual ports using SGs, use NACLs instead
Many EC2s to one SG, many SGs to many EC2s
Can specify allow rules but not deny rules
Locked down to a region/VPC combination
SGs can reference other SGs

EBS 101

Elastic Block Storage provides persistent block storage
Automatically replicated within same AZ
General Purpose (SSD), Provisioned IOPS (SSD), Throughput Optimised HDD, Cold HDD, Magnetic
EBS volumes are always in the same AZ as the EC2 instance
HDD based volumes will always be less expensive than SSD types

EBS Volumes and Snapshots

You should stop instance before taking EBS root volume snapshot
Can create AMIs from Snapshots
Can change EBS volume size on the fly, including storage type
Can move EC2 volume from one AZ to another by snapping the volume and then create an AMI from that volume and use it to launch an EC2 in the new AZ
Same can be done for moving EC2 volumes to another region
Snapshots are incrementally backed up to S3
Non-root Volumes can be detached without stopping the instance
Snapshots of the root device of an EBS volume cannot be deleted if they are used by a registered AMI. The AMI must be removed first.

AMI Types

Instance Store volumes are sometimes called Ephemeral storage
Instance store volumes cannot be stopped. If the underlying host fails, you will lose your data
EBS backed instances can be stopped; you will not lose data on this instance if it’s stopped
You CAN reboot both, no data loss
By default, both root volumes will be deleted on termination, however with EBS volumes, you can tell AWS to keep the root device volume

ENI vs ENA vs EFA

ENI - Elastic Network Interface -

Logical component in a VPC that represents a virtual network card
A primary private IPv4 address from range on your VPC
One or more secondary private IPv4 addresses
One Elastic IP (IPv4) per IPv4 address
One public IPv4 address
One or more public IPv5 addresses
One or more SGs
Bound to a specific AZ
Can be moved from one EC2 instance to another instance
A MAC address, src/dst check flag, a description
Used for multihoming systems, perhaps for a logging or management network


EN - Enhanced Networking - Uses single root IO virtualization (SR-IOV) to provide high-performance network capabilities on supported instance types

Provides higher bandwidth, higher packet per second (PPS) and consistently lower inter-instate latencies.
No additional charge for EN.
Depending on instance type, EN can be enabled using:

Elastic Network Adapter (ENA) of up to 100Gbps
Intel 82599 Virtual Functions (VF), supporting up to 10Gbps. Typically for older instances


Elastic Fabric Adapter - A NIC to accelerate HPC and ML apps

Provides lower and more consistent latency and higher throughput than the TCP transport
EFA can use OS-bypass, enabling HPC and ML apps to bypass the OS kernel to communicate directly with the EFA device.
Only supported on Linux


Elastic IP (EIP)

If you need a fixed & public IP for an EC2 instance, you need an EIP
Will remain yours as long as you don't delete it
EIPs can only be attached to one instance at a time
Default max of 4 EIPs per account (request to AWS for increase)


Encrypted Root Device Volumes & Snapshots

Snapshots of encrypted volumes are encrypted automatically
Volumes are restored from encrypted snapshots are encrypted automatically
You can share snapshots, but only if they are unencrypted \

EC2 Hibernate

When you tell an EC2 to hibernate, the OS is told to perform hibernation (suspend-to-disk). Hibernation saves the contents from the instance memory (RAM) to your EBS root volume. We persist the instances EBS root volume and any attached EBS data volumes
Root volumes must be encrypted for hibernation to work’
Instance RAM must be less than 150Gb
Supported on Windows, Amazon Linux 2 AMI and Ubuntu
Can’t be hibernated more than 60 days
Starting a hibernated instance:

EBS root volume is restored to its previous state
RAM contents are reloaded
Processes are resumed
Previously attached data volumes are reattached and retain their instance ID


Scalability and HA

Vertical

Scale an instance up by adding an instance
Common for non-distributed systems RDS or Elasticache


Horizontal

Increase the number of instances / systems your application can use
Useful for distributed systems
Auto Scaling Groups and Load Balancers can facilitate horizontal scaling


High Avalability

Usually goes together with horizontal scaling
Means running app / system in at least 2 AZs
Goal is to survive an AZ loss
HA can be passive - for example, with RDS Multi AZ
Auto Scaling Group with Multi AZ
Load Balancer with Multi AZ


Load Balancer


Spread load across multiple downstream instances


Expose single point of acess (DNS) to app


Seamlessly fail between instances


Performs regular health checks


Provides SSL termination


Enforce stickiness with cookies


HA across zones


Separate public from private traffic


ELB - Managed Load Balancer

AWS takes care of maintenance, upgrades, HA
AWS provides relatively limited configuration options
Integrated with many AWS services


Can roll your own LB, cheaper but much more effort


Health Checks

Enable LB to know if instances are available and reply to requests
Health check is done on a port and route (i.e. /health)
If response is not 200 (OK), then instance is unhealthy, will stop sending traffic


Types of AWS Load Balancers

Classic Load Balancer v1 (2009)

Supports HTTP, HTTPS (Layer 7) and TCP (Layer 4)
Health Checks are TCP or HTTP based
Fixed hostname xxx.region.elb.amazonaws.com
Supports only one SSL certificate
Must use multiple CLB to support multiple hostname with multiple certificates


Application Load Balancer v2 (2016)

HTTP, HTTPS, WebSocket
LB to multiple HTTP apps across machines (Target Groups)
LB to multiple apps on the same machine (i.e. containers)
Support for HTTP/2 and WebSocket
Supports redirects, i.e. HTTP to HTTPS
Supports multiple listeners with multiple SSL certificates using SNI
Routing tables

Routing based on path in URL i.e. /users and /posts to different TGs
Routing based on hostname in URL one.ex.com and two.ex.com to different TGs
Routing based on Query String, Headers ex.com/users?id=123 to specific TG


Supports port mapping to redirect to a dynamic port in ECS
Would require multiple Classic LBs to per app
Target Groups

EC2 instances (can be managed by Auto Scaling Groups) - HTTP
ECS Tasks - HTTP
Lambda functions - HTTP request is translated into a JSON event
Private IP addresses


ALBs can route to multiple target groups
Health checks are performed on the TG
ALBs get a fixed hostname, like classic
App servers don't see the IP of the client

IP of client is inserted into the header X-Forwarded-For
Port is in X-Forwarded-Port and protocol X-Forwarded-Proto


Network Load Balancer v2 (2017)

Layer 4
TCP, TLS, and UDP
Handles millions of requests per second
Low latency ~100ms vs 400ms for ALB
One static IP per AZ, and supports assigning Elastic IP
Supports multiple listeners with multiple SSL certificates using SNI
Security Groups

Not assigned to NLB
Traffic originates from client
Must allow traffic from public to EC2 instance


Stickiness

Same client is always redirected to the same instance behind the LB
Works with Classic Load Balancers and ALBs
A cookie is used for stickiness and has an expiration controlled by you
Use case: make sure user doesn't lose their session data
Enabling stickiness can bring imbalance to load across backend EC2 instances
Configured in the Target Group


Cross-Zone Load Balancing

Each load balancer instance distributes evenly across all registered instances in all AZs
Without cross-zone load balancing, the LB distributes requests evenly within its AZ only
Classic Load Balancer

Disabled by default in Classic Load Balancer
No charges for inter AZ data if enabled


Application Load Balancer

Always on - can't be disabled
No charges for inter AZ data


Network Load Balancer

Disabled by default
Charged for inter AZ data if enabled


SSL/TLS Certificates

Allows traffic between clients and LB to be encrypted in transit
Public certificates are issued by Certificate Authority
Traffic from Users to LB is encrypted, from LB to backend is unencrypted
Can manage certificates in AWS Certificate Manager
HTTPS Listener

Speifify default certificate
Can add optional list of certs to support multiple domains
Clients can use SNI (Server Name Indication) to specify the hostname they reach
Ability to specify a security policy to support older versions of SSL/TLS (legacy clients)


Server Name Indication (SNI)

Solves the problem of loading multiple certificates on to one web server
Requires the client to indicate the hostname of the target server in the initial SSL handshake
Only works with ALB and NLB and CloudFront, does not work with CLB


Internal and external ELBs can be setup


LBs can scale out but not instantaneously - can contact AWS for a "warm-up"


Troubleshooting

4xx errors are client induced errors
5xx errors are application induced errors
LB Error 503 means at capacity or no registered target
If LB can't connect to app, check security groups


Monitoring

ELB access logs will log all access requests so you can debug per request
CloudWatch Metris will give aggregated statistics (i.e. connection count)


CloudWatch

Provides metrics for every AWS service
Metrics are a variable to monitor (i.e. CPUUtilization, NetworkIn)
Metrics belong to namespaces (grouped)
Dimension is an attribute of a metric (which instance id sent CPUUtil?, environment and so on)
Max 10 dimensions per metric
EC2 Host Level Metrics consist of: CPU, Network, Disk, Status Check. Runs every 5 minutes by default.
1 minute intervals by turning on detailed monitoring
CW Dashboards visualize AWS activity

Dashboards are global; can be seen in any region
Can include graphs from different regions


CW Alarms for reporting problems in AWS

Trigger notifications for any metri
Can be sent to ASG, EC2 Actions and SNS Notifications


CW Events allow you to respond to state changes

Schedule events (cron jobs) to run a job
Or Event Pattern: Event rules to react to a service doing something
CW Events creates a small JSON doc with info about the event change


CW EC2 Detailed Monitoring

EC2 metrics default to every 5 mins; for a cost they can run every 1 min
Detailed monitoring allows for faster scaling of ASGs
EC2 Memory usage is not a default metric, must be pushed from inside instance as custom metric


CW Custom Metrics

Define and send custom metrics to CW
Metric resolution defaults to 1 minute
High resolution is up to 1 second (StorageResolution API parameter); can be triggered every 10s
Custom Metrics are sent using the PutMetricData API


CW Instance Recovery

CW Alarm Status Check

Instance status - check the EC2 instance
System Status - check underlying hardware
If the alarm, Recovery is triggered and the following are preserved:

Private IP
Public
Elastic IP
Metadata
Placement Group
NOT perserved: Data on the instance (root) volume


Logs

Apps can send logs to CW using SDK
CW can collect logs from

Beanstalk: collect from apps
ECS: collection from containers
Lambda: collection function logs
VPC Flow Logs
API Gateway
CloudTrail based on filters
CW log agents (i.e. on EC2 instances)
Route53 DNS queries


Logs can be shipped to

Batch exporer to S3 for archival
Stream to ElasticSearch for analytics


Architecture

Log groups: arbitrary name, usually representing an app
Log stream: Instances within an app / log file / container


We can define custom expiration dates, including never expire
Possible to use the CLI to tail CW logs
IAM permissions must be set correctly so it can receive logs
Supports filter expressions to help drill down on search queries
CW Logs Insights is a tool for querying logs and adding queryes to Dashboards
CW Unified Agent provides much more detailed logging (netstat, processes, CPU metrics etc.)


CloudTrail

Enabled by default
Provides compliance and audit for AWS accounts
Captures a history of events and API calls by the Console, SDK, CLI, AWS Services
CloudTrail logs can be shipped to CW Logs

AWS Config

Configuration management service that allows you to declare the desired state of a service
Config will detect changes and send an alarm
Per-region service
AWS provides many managed config rules
Custom rules can be created, must be defined using AWS Lambda
Rules are evaluated/triggered

For every config change, and/or
At regular intervals
Can triggered CW Events if rule is non-compliant


Rules can have auto remediation; restore config to desired state if a drift is detected
AWS Config Rules have no deny/prevent action

IAM Roles

Roles are much more secure than storing keys on individual instances
Roles are easier to manage
Roles can be assigned to an instance via the console and command line
Roles are universal
If an EC2 has a Role attached and an associated Policy is updated, the update will take effect immediately

Instance Metadata

Used to get information about an instance
Curl http://169.254.169.254/latest/meta-data and latest/user-data

EFS

Elastic File System
Storage capacity is elastic, growing and shrinking automatically as required
When you need resilient storage for Linux instances
Can be shared between EC2 instances

Spot Instances

Instances available at a lower rate based on AWS overall capacity
Can be triggered to launch based on users defined price points
Can save up to 90% on the cost of an On-Demand instance
Useful for any type of computing when you don’t need persistent storage
Can block instances from terminating using Spot Block

Spot Fleets

A collection of Spot Instances and optional On-Demand Instances
Attempts to launch the number of Spot Instances and On-Demand Instances to meet the target capacity specified
Spot Fleet attempts to maintain its target capacity if an instance is interrupted, the Spot Fleet will launch a new instance
Strategies:

capacityOptimized

Come from pool with optimal capacity


Diversified

Distributed across all pools


lowestPrice

Instances come from pool w/lowest price


InstancePoolsToUse

Distributed across the number of pools you specify. Only valid with the lowestPrice parameter


Amazon FSx for Windows file Server

A managed Windows Server that supports SMB and NTFS
Built on SSD
Can be access on-prem
Can be configured for Multi-AZ (HA)
Data is backed up daily to S3
Designed for Windows and Windows apps
Supports AD Users, access control, lists, groups, security policies and DFS namespaces and replication
Similar to EFS, except Amazon does not support Windows instances that can connect to EFS file systems
Not elastic, must set storage size and throughput during creation

Amazon FSx for Lustre

Lustre (Linux/Cluster) is a parallel distributed file system for large-scale computing
Fully managed file system that is designed specifically for compute-intensive workloads, HPC, ML, media processing etc.
Sub-millisecond access to data and allows read/write at speeds of up to hundreds of gigabytes per second and millions of IOPS
Seamless integration with S3

Can read S3 as a file system (through FSx)
Can write output of computations back to S3 (through FSx)


Can be used from on-prem servers

EC2 Placement Groups

Can’t merge placement groups
Only certain instance types can be in a placement group (Compute Optimized, GPU, Memory Optimized, Storage Optimized)
Existing instances can be moved into a placement group.  The instance must be stopped.
Instances can be moved or removed from a placement group via the CLI or SDK, but not yet via the console
Cluster Placement Group

Grouping of instances within a single AZ
Same rack - great for network connectivity; however a failure can take out all instances
Good for apps that require low network latency and/or high throughput
Only certain instances can be launched in a Clustered Placement Group
AWS recommends homogeneous instances within clustered placement groups


Spread Placement Group

Minimize failure risk by spreading instances across AZs
Limited to 7 instances per AZ
Instances that are placed on distinct underlying hardware (separate racks, power and network)
Recommended for apps that have a small number of critical instances that should be kept separate
Can be deployed across multiply AZs


Partitioned Placement Group

EC2 instances are divided into logical segments called partitions (physical rack).
Partition data is available in EC2 user-data
AWS ensures each partition within a placement group has its own set of racks, which has its own power source and network source
No two partitions share the same underlying hardware, allowing you to isolate the impact of hardware failure


HPC

Services to facilitate HPC

GPU or CPU optimized instances
EC2 Fleets (Spot Instances or Spot Fleets)
Placement Groups
Enhanced Networking
Elastic Network Adapters
Elastic Fabric Adapters


Storage Services

EBS: Scale up to 64K IOPS with Provisioned IOPS
Instance Store


Network Storage

S3: Distributed Object-based storage
EFS: Scale IOPS based on size, or use Provisioned IOPS
FSx for Lustre: HPC-optimized DFS


Orchestration and Automation

AWS Batch allows execution of hundreds of thousands of batch computing jobs
Batch supports multi-node parallel jobs that span multiple EC2 Instances
Batch supports scheduling jobs/launching instances based on requirements


AWS ParallelCluster

Open Source cluster management tool based on CfnManage
Automate create of VPC, subnet, cluster type and instance types


AWS WAF

Monitor HTTP and HTTPS requests that are forwarded to CloudFront, ALB or API Gateway
L7-aware firewall
Conditions can be configured such as

IP address allowlist
Query string parameters are accepted
Country that request originates from
Strings that appear in request (exact match or regex)
Length of request
Presence of SQL
Presence of a script (cross-site scripting)


Databases


RDS


SQL, MySQL, PostgreSQL, Oracle, Aurora, MariaDB


RDS Reserved Instances are available for Multi-AZ deployments


Backup window changes are implemented during the next scheduled maintenance window or immediately


Two key features:

Multi-AZ for DR
One DNS name that points to more than one backend DB.  In an AZ failure, the DNS will automatically update to point to the secondary DNS


Read-Replicas for performance


For every write, the write is replicated to the secondary replica.  In the event of a failure, a manual update
of the DNS name is required


Primary usecase is to have a replica for read-intensive applications


AWS manages the OS, customer responsible for KMS, security groups, IAM policies, TLS/SSL


Automated Backups

Enabled by default
Backup data is stored in S3, with free S3 storage equal to the size of your DB
Backups are taken within a defined window, storage I/O may be suspended during backup
Allow you to recover your db to any point in time within a retention peroid
Retention period can be between 1 and 35 days
Full daily snapshots
Stores transaction logs through the day
During a recovery, AWS will first choose the most recent daily backu, and then application relevant
transaction logs to that day
This allows for point in time recovery down to a second


DB Snapshots

Manually initiated by the user
Retained even after deleting the original RDS instance unlike automated backups


Restoring Backups

Restoring either automated or manual backups go to a new RDS instance with a new DNS endpoint


Encryption at Rest

Supported with MySQL, Oracle, SQL Server, PostgreSQL, MariaDB and Aurora
Encryption is done using AWS Key Management Service (KMS)
Once enabled, data stored at rest is encrypted, as are automated backups, read replicas and snapshots


Multi-AZ

Exact copy of production db in another AZ
AWS handles replication so when there is a write to your production db, it is automatically synchronized to the standby db
All writes to primary DB are sync'd with a secondary DB
In the event of planned db maintenance, DB instance failure or AZ failure, Amazon RDS will automatically failover to
the standby db so that operations can reusme without administrative intervention
Used for DR recovery. Not intended to improve performance.


Read Replicas

MySQL, PostgreSQL, MariaDB, Oracle, Aurora
Read-only copy of production db
Achieved using asynchronous replication from primary DB to the read replica
Useful for databases with read-intensive workloads
Effective way to improve performance
Used for scaling, not DR
Must have automatic backups turned on in order to deploy a read replica
Max of 5 read replicas copies of any db
Can have read replicas of read replicas, but be aware of potential latency
Each replica will have its own DNS endpoint
Read replicas can have Multi-AZ
Can create replicas of Multi-AZ source databases
Read replicas can be promoted to their own db.  Breaks the replication
Read replicas can be in a second region


RDS Troubleshooting

If you want your application to check RDS for an error, have it look for an Error node in the response from the Amazon RDS API.


DynamoDB Non-Relational Database (NoSQL)

Collection = Table
Document = Row
Key/value pairs = columns


Data Warehousing (OLAP)

Used for large and complex datasets
Amazon's solution is RedShift


Elasticache

A managed in-memory cache on AWS
Improves performance by allowing retrieval of information from fast, in-memory caches instead of relying on disk-based access
Supports two open-source caching engines: Memcached and Redis


DynamoDB (No SQL)

Fully managed DB that supports both document and key-value data models
Stored on SSD storage
Spread across 3 grophically distinct data centres
Eventual Consistent Reads (Default)

Consistentcy across all copies of data is usually reached within a second. Repeating a read after a short time
should return the updated data. If app can tolerate reading a write than is >1s, use eventual consistent reads


Strongly Consistent Reads

Returns result that reflects all writes that received a succesful response prior to the read. Response < 1second


Fully managed, HA with replication across 3 AZ
NoSQL db, not a relational db
Millions of rps, trillions of rows, 100s of TB of storage
Integrated with IAM for security, authorization and authentication
Enables event driven programming with DynamoDB Streams
Basics

Made of tables
Each table has a primary key (set up at creation time)
Tables can have infinite number of items (rows)
Each item has attributes
Maximum size of an item is 400Kb
Data Types supported

Scalar Types: Strings, Numbers, Binary, Boolean, Null
Document Types: List, Map
Set Types: String Set, Number Set, Binary Set


Provisioned Througput

Table must have provisioned read/write capacity units
Read Capacity Units *RCU) is througput for reads

1 RCU: 1 trong consistent read of 4Kb per second
1 RCU: 2 eventually consistent read of 4Kb/s


Write Capacity Units is througput for writes

1 WCU: 1 write of 1Kb/s


Optionally set up autoscaling for througput on demand
Througput can exceed temporarily by using a "burst credit", if burst credit is empty, a        ProvisionedThroughputException is thrown


On Demand Througput

Scales automatically; no need to set read/write capacity units


Security

VNC endpoints w/o internet
Access controlled completely by IAM
Encryption at rest using KSM
Encryption in transit using TLS


Backup and Reovery

Point in time restore just like RDS
Backup/restore does not impact performance of live db


Advanced DynamoDB

DynamoDB Accelerator (DAX)

Seamless cache for DynamoDB, no application re-write required
Fully managed, highly available, in-memory cache
10x performance improvement over DynamoDB Standalone
Reduces request time from milliseconds to microseconds -- even under load
No need for devs to manage caching logic
Compatible with DynamoDB API
Writes go through DAX
Solves hot key problem (too many reads)
Up up 10 nodes per cluster
MultiAZ, 3 nodes minimum for prod
Security: Supports encryption at rest with KMS and integration w/ VPC, IAM, CloudTrail.
DAX sits between app and db, no app integration required


Transactions

Multiple or "all-or-nothing" operations
Financial transactions
Fulfilling orders
Performs two underlying reads or writes -- prepare the transaction, commit the transaction
Transactions can operate on up to 25 items or 4MB of data


On-Demand Capacity

Pay-per-request pricing
Pay more per request than with provisioned capacity
DynamoDB scales up/down as traffic dictates. No minimum capacity
No charge for read/write -- only storage and backups


On-Demand Backup and Restore

Full backups at any time
Zero impact on table performance or availability
Consistent within seconds and retained until deleted.
Operates within the same region as the source table; can't do backups/restores across regions


Point-in-Time Recovery

Protects against accidental writes or deletes
Restore to any point in the last 35 days
Incremental backups
Not enabled by default
Latest storable timestamp: 5 mins in past


Streams

Changes in DynamoDB (create/update/delete) can end up in a Stream
Streams can be read by Lambda

React to changes in real time
Analaytics
Insert into ES or other service, etc...


Time-ordered sequence of item-level changes in a table
Data retention of24 hours
Streams consist of stream records which represent a single data modification in a table
Each stream record has a record numnber
Could implement cross-region replication using Streams


Global Tables

Managed Multi-Master, Multi-Region Replication (Active/Active replication)
Based on DynamoDB Streams
Multi-region redundancy for DR or HA
No application changes required
Replication latency under one second


Database Migration Service (DMS)

Source DB can be on-prem, EC2 or RDS. Aurora, DB2, SQL, Oracle...
Target DB can be on-prem, EC2 or RDS.  Aurora, Kinesis, Oracle, DynamoDB, DocumentDB....
Source DB remains operational


Redshift

Fully managed data warehouse (OLAP)
Config options:

Single Node (160Gb)
Multi-Node

Leader Node (manages client connections and receives queries)
Compute Node (store data and perform queries and computations). Up to 128 Compute Nodes


Redshift uses several compression techniques
Redshift samples your data and determines best compression scheme
Redshift automatically distributes data and query load across all nodes
Backups

Enabled by default with 1 day retention period
Max rentention is 35 days
Always attempts to maintain at least 3 opies of data (original and replia on compute nodes and backup in S3)
Can also async replica snapshots to S3 in another region for DR


Pricing

Compute Node Hours -- total number of hours run across all compute nodes for billing period
3-node data warehouse running persistently for 1 month would incur 2160 instance hours
Leader nodes are not changed; only computer nodes
Backups
Data transfer - only within VPC, not outside it


Security

Encrypted in transit using SSL
Encrypted at rest using AES-256 using KMS
Can manage own keys through HSM


Availability

Only available in 1 AZ
Can restore snapshots to new AZs


Aurora

MySQL and PostgreSQL- compatible database
Start with 10Gb, scales in 10Gb increments up to 64Tb (storage autoscaling)
Compute resources can scale up to 32vCPUs and 244Gb Memory
2 copies of data is contained in each AZ, minimum 3 AZ.  6 total copies of data.
Designed to transparently handle loss of up to 2 copies of data w/o affecting db write
availability and up to 3 copies w/o affecting read availability
Storage is also self-healing
Replicas

Aurora Replicas (currently 15)
MySQL Read Replicas (currently 5)
PostgreSQL (currently 1)
Aurora and MySQL support cross-region replication


Backups

Automated backups are always enabled on Aurora DB Instances. No performance hit.
Snapshots are also availability. No performance hit.
Can share Aurora snapshots with other AWS accounts


Aurora Serverless

On-demand, autoscaling configuration for MySQL and PostgreSQL compatible editions of Aurora
Serverless DB starts up, shuts down, and scales capacity up/down based on app needs
Useful for infrequent, intermittent or unpredictable workloads
Pay per-invocation, not per-hour


Elasticache

Fully managed in-memory cache
Increase db and web app performance
Supports memcached and redis; redis for more complicated configurations
Redis is Multi-AZ
Redis backup/restore is supported

Database Migration Service (DMS)

Supports migration between source/target on-prem and in-cloud databases
Essentially extracts data from source and loads into target on a schedule.
Can pre-create tables manually or use AWS Schema Conversion Tool (SCT) to create some or all target tables,
indexes, views, etc.
Supports homogenous and heterogenous migrations (oracle->oracle or oracle->mysql for example)
Heterogeneous migration requires using AWS Schema Conversion Tool (SCT)

Caching Strategies

CloudFront
API Gateway
Elasticache
DynamoDB Accelerator (DAX)

EMR Overview

EMR is a cluster of EC2 instances.
Each instance is called a node; each node has a role, referred to as a node type
EMR installs different software components on each node type
Node Types

Master Node

Manages the cluster
Tracks status of tasks and monitors health of cluster


Core Node

Has a software component that runs tasks and stores data in Hadoop Distributed File System (HDFS)


Task Node

Runs software that does not store data in HDFS
Tasks nodes are optional


All nodes talk to each other
Best practice to configure cluster to ship logs to S3 (5 min interval).
This setting is only available at cluster setup.

AWS Directory Service

Managed Active Directory

Standalone managed directory in the cloud running on EC2 servers
Connects AWS resources with on-prem AD via trust
SSO to any domain-joined EC2 instance
Reachable by applications in VPC
Adds DCs for HA and performance
Exclusive access to DCs (not shared)
Extend to existing AD using a trust
AWS Responsibility

Multi-AZ Deployment
Patch, monitor, recover
Instance rotation
Snapshot and restore


Customer

Users, groups, GPOS
Scale out DCs
Trusts
Certificate authorities
Federation


Simple AD

Standalone managed directory running a Linux/Samba AD compatible server
Two sizes. Small < 500 users and Large < 5000 users
Easier to manage EC2
Good for Linux workloads that need LDAP
Does not support trusts; no on-prem integration


AD Connector

Directory gateway/proxy for on-prem AD
No users stored anywhere except on-prem
Avoid caching information in the cloud
Allows on-prem to log into AWS AD
Join EC2 instances to your existing AD
Scale our across multiple domain controllers


Cloud Directory

Directory-based store for developers
Multiple hierarchies with hundreds of millions of objects
Use case: org charts, course catalogs, device registries
Fully managed service


Amazon Cognito User Pools

Managed user directory for SaaS applications
Sign-up and sign-in for web or mobile
Works with social media identities


AD Compatible

Managed AD (aka Directory Service for AD)
AD Connector
Simple AD


Not AD Compatible

Cloud Directory
Cognito user pools


IAM Policies

Amazon Resource Name (ARN) uniquely identify all AS resources

Format: arn:partition:service:region:account_id
partition = "aws" and "aws-cn" for China
service = s3|ec2|rds ...
region = us-east-1, ca-central-1
account_id = AWS account id


ARNs end with:

resource
resource_type:
resource_type/resource
resource_type/resource/qualifier
resource_type/resource:qualifier
resource_type:resource
resource_type:resource:qualifier


Examples

arn:aws:iam:1234636:user/steve <- resource is user.  region is :: because IAM is global
arn:aws:s3:::my_bucket/image.jpg <-- no specific region or account id needed for S3
arn:aws:dynamodb:us-east-1:12235325:table/orders <-- resource type is table
arn:aws:ec2:us-east-1:235235235:instances/* <-- refer to all instances in a specific region


IAM Policies

JSON document that defines permissions
Identity Policy: attached to IAM user, group or role
Resource Policy: attached to resources i.e. S3 buckets. Specify who has access and what access they have
Policy must be attached to an identity or resource


Advanced IAM

AWS STS - Security Token Service

Allows you to grant temporary and limited access to AWS resources
The token is valid for up to 1 hour
AssumeRole API

AssumeRole can be used within own account
Cross account access can be achieved by having AssumeRole in target account
How to Assume a Role

Define an IAM role within your account or cross account (target)
Define which principals can access the role
STS to retrieve credentials and impersonate the IAM Role (AssumeRole)
Credentials are valid from 15 mins up to 1 hour


AssumeRoleWithSAML API

Returns credentials from users logged in with SAML


AssumeRoleWithWebIdentity

Return credentials from an IdP (Facebook, Google, OIDC etc.)
AWS recommends against this strategy, instead use Cognito


GetSessionToken

Required for any user with MFA enabled


Identity Federation in AWS

Lets users outside of AWS to assume temp role for access to AWS resources
Users assume identity provided access role
Federation options include SAML 2.0, SSO, Custom Identity Broker, Cognito...
Federation does NOT require an IAM user, user management is outside AWS
SAML 2.0

SAML 2.0 is the "old way"; SSO Federation is a managed service and simpler
Integrates with AD and ADFS, or any SAML 2.0 service
Provides temp creds to Console or CLI
No need for IAM user
Requires a trust between IAM and SAML


Custom Identity Broker

Required only if IdP provider is not compatible with SAML 2.0
IdP broker determine4s the appropriate IAM policy
Uses AssumeRole or GetFederationToken


Web Identity Federation

Not recommended by AWS; use Cognito instead
Use Facebook, Amazon, Google etc as identity source


Cognito

Cognito validates token with SAML IdP
Once validated, Cognito will get temp credentials from STS and send them back to client
Client can now access AWS resource via temporary credentials from STS


AWS Resource Access Manager (RAM)

Resourcing sharing between accounts
Not all resoures can be shared
Example: Launch EC2 instances in a shared subnet between account
Account that is sharing resource will send a request to the second AWS account, must be accepted in RAM console
Policies associated with the second account will apply to the shared resource

AWS SSO

Centrally manage accounts and applications
Use existing corporate identity with AWS (from i.e. GSuite, Dropbox, etc)
SSO can use SAML to integrate with on-prem AD using an AD trust
All sign-on activities are logged in CloudTrail

DNS 101

Convert human friendly domain names to IP addresses
Two forms, ipv4 and ipv6
IPv4 is a 32 bit field, over 4 billion different addresses
IPv6 is 128 bits
Top-level domain: .com, .gov, etc
Top-level domains are controlled by IANA
Domain registrars include AMazon, GoDaddy etc
Start of Authority Record (SOA)

Name of the server that supplied data for the zone
Adminstrator of the server
Current version of data file
Default number of seconds for TTL on resource records


Name Server (NS) records

Used by top level domain servers to find nameserver record for individual domains
NS records provide SOA
A record maps IP to name
TTL is the length a DNS record is cached on resolving servers and local PCs. Lower the time, faster DNS changes
are propogated
CNAME is used to resolve one name (IP or ALIAS) to another
ALIAS records are used to map AWS resource record sets to your hosted zone ELB, CF, etc
ALIAS records can point at the top node ofa DNS namespace (apex)


Route53 Routing Policies

Simple Routing

One record with multiple IP addresses
AWS returns them in random


Weighted Routing

Allows you to split traffic based on different weights
One IP per record set. Each record set has a defined weight.
Example: Send 10% of traffic to us-east-1 (Record Set 1) and 90% to ca-central-1 (RS2)
Can assign Health Checks per record set


Latency-based routing

Route traffic based on lowest network latency for your end user
Create latency resource record set for EC2/ELB resource in each region
Route53 receives a query and selects the latency resource record set for the region that gives the user the lowest
latency


Failover routing

Creates an active/passive configuration
When selecting Routing Policy: Failure, "Failover Record Type" options are Primary and Secondary
Monitors primary site using a Health Check. If failure is detected, Route53 will direct traffic to secondary site


Gelocation Routing

Route53 will direct traffic based on the geographic location of end user
Locations can be set on continents and on countries
Ex: All queries from London go to EC2 instances in london region; London users receive EU-based information


Geoproximity Routing (Traffic Flow Only)

Route53 will direct traffic based on location of users AND the location of AWS resources
Optionally route more or less traffic to a given resource, known as a bias
A bias expands or srinks the size of the geographic region from which traffic is routed
Requires use of Route53 Traffic Flow

Create Traffic Policy

Select DNS Type (i.e. A Record)
Select "Connect To", and choose a routing policy from above
From here, can input lat/lon coordinates, endpoint location, bias, health checks
Repeat for Region 2
Not tested on SAA exam


Multivalue Answer Routing

Exactly the same as Simple Routing however allows you to associate Health Checks per Record Set


General Exam Tips

ELBs do not hae pre-defined IPv4 addresses; resolve using DNS name
Understand ALIAS vs CNAME
Default limit of 50 domain names; can host more by contacting AWS support


Solutions Architecture Discussion


3-Tier architecture

ELB sticky sessions
Web client for storing cookies and making our app stateless
ElastiCache

For storing sessions
For caching data from RDS
MultiZI


RDS

For storing user data
Read replicas for scaling reads
Multi AZ for DR


Tips

EFS is a network file system and useful for sharing file systems across many instances


Won't help with making app stateless

Storing shared data in EBS volumes - EBS volumes are for a specific AZ and can only be attached to one EC2 at a time


Initiating Apps Quickly

EC2 Instances

Golden AMI: pre-configured AMI with app and OS dependencies
Bootstrap: User Data scripts for dynamic configuration
Hybrid: Golden AMI + User Data


RDS Databases

Restore from snapshot: db will ahve schemas and data ready


EBS Volumes

Restore from snapshot: disk will be properly formatted and have data


Beanstalk Overview

Developer centric view of deploying apps on AWS
EC2 instance configuration and OS is handled by ElasticBeanStalk
Deployment strategy is configurable but is handled by EBS
Just the app code is the responsibility of the developer
Three architecture models

Single Instance deployment: good for development
High Availability: LB + ASG: for production or pre-prod
Custom


EBS has three components:

Application
App Version: each deployment gets assigned a version
Environment name (prod, dev, etc)


Application versions can be promoted through environments
Rollback is possible to previous versions
Ful control over app lifecycle

AWS EC2 Instance Metadata

Can retrieve IAM Role Name but NOT retrieve the IAM Policy
http://169.254.169.254/latest/ <- latest is the API version
Returns dynamic, meta-data and user-data

meta-data returns information about the instance i.e. ami, instance-id, instance-type, profile, ipv4...


When an IAM role is attached to an EC2 instance, the short-lived credentials are available in
metadata/iam/security-credentials/YOUR_ROLE_NAME

AWS SDK Over

Offical SDKs include Java, .NET, PHP, Python (boto3 / botocode) etc...
cli is a wrapper around boto3
Recommendation is to use the default credential provider chain

AWS credentials at ~/.aws/credentials
Instance Profile Credentials using IAM Roles
Environment Variables (AWS_ACCESS_KEY etc...)


Exponential Backoff

Any API that fails beause of too many calls needs to be retried with Exponential Backoff
Applies to rate limited API
Retry mechanism included in SDK API calls
Each subsequent call waits twice as long as the previous call


Mesaging Overview

Two patterns of application communication

Synchronous communications (app to app)

Buying Service <->  Shipping Service
Can be problematic if there are sudden spikes in traffic


Asynchronous / Event Based (app to queue to app)

Buying Service> -> Queue/Middleware -> Shipping Service
Decoupling the app allows you to scale the queue/middleware, i.e:

SQS: queue model
SNS: pub/sub model
Kinesis: real-time streaming model
These services can scale independently of the application


SQS Overview

Fully managed queue service
The producer generates a message and delivers it to the queue
The consumer polls the queue and processes the messages
The queue service is there to decouple your services
Attributes

Unlimited throughput, unlimited number of messages in queue
Default retention of messages: 4 days, max 14
Low latency <10ms on publish and receive
Limit of 256KB per message


Can configure DelaySeconds parameter to postpone delivery of new messages to a queue
Can have duplicate messages  (at least once delivery)
Can have out-of-order messages (best effort ordering)
Producing Messages

Produced to SQS using SDK (SendMessage API)
The message is persisted in SQS until consumer deletes it
Example: send an order to be processed

Order id, Customer id, etc...


Consuming Messages

Consumers are custom apps that can run on EC2, Lambda, etc
Poll SQS for messages, recv up to 10 at a time
Process the message - i.e. deliver to RDS, etc.
Delete message using DeleteMessage API
Can do parallel processing with multiple consumers


SQS Message Visibility Timeout

Immediately after a message is reeived, it remains in the queue. To prevent other consumers from message the message
again, SQQS sets a visibility timeout a period of time which SQS prevents other consumers from receiving the message.
Once a message is polled by a Consumer, it becomes invisible to other Consumers
Default message visibility timeout is 30s; meaning message must be processed within 30s
If Consumer needs more than timeout, Consumer can call ChangeMessageVisibility API


Dead Letter Queues

Consumer fails to process a mesage within Visibility Timeout, message goes back to queue
Can set threshold (MaximumReceives)of how many times a message can go back to queue
If MaximumReceives threshold is hit, message goes to dead letter queue (DLQ)
Process DLQ before they expire

Good to set retention period on DLQ


Delay Queues

Delay a message up to 15 mins, so Consumers can't see it
Default is 0 seconds, but can set a new default at queue level
Can override the default on send by using DelaySeconds parameter


FIFO Queues

First message in, first message out
Consumer receives messages in order
Limited to 300 msgs/s without batching, 3000 msgs with
Exactly-once send capability
Supports de-duplication of messages


SNS

Pub/Sub managed service
Event producer sends message to one SNS topic
Event receivers/subscribers can subscribe to the SNS topics
Subscribers can be

SQS, HTTP, HTTPS, Lambda, Email, Email (JSON), SMS messages, Mobile Notifications


Publish (SDK)

Create a topic
Create a subscription
Publish to topic


Direct Publish (mobile apps SDK)

Create platform application
Create a platform endpoint
Publis to platform endpoint
Supports Google GCM, Apple APNS, Amazon ADM, ...


Security

HTTPS API by default
At-rest encryption using KMS keys
Client-side encryption
Access Controls: IAM Policies to control access to SNS API
SNS Access Policies (similar to S3 bucket policies)

Useful for cross-account access to SNS topic
Useful for allowing other services such as S3 to write to an SNS topic
SNS + SQS Fan Out


Publisher pushes message once to SNS, each SQS queue subscriber will receive the message
Service that pushes message to SNS has no idea about SQS, fully decoupled
SQS allows for data persistence, delayed process and retries of work
SQS Queue access policy must allow SNS to write
SNS canot send messages to SQS FIFO Queues (AWS limitation)'
Example use case

S3 Even Types can only have one S3 event rule; cannot send to multiple email addresses or SQS Queues
That rule could instead publish to SNS
SNS Subscribers (SQS Queues) will receive and process the messsage
Many applications can read message from SQS Queues


Kinesis Overview

Managed alternative to Apache Kafka
Great for processing real time data such as app logs, systems metrics, IoT events, clickstreams...
Data is automatically replicated to 3 AZ
Kinesis Streams

Low latency stream ingest at scale
Divided into ordered Shards/Partitions

Producers send to Shards, Consumers pick up from Shards


Data retention is 1 day by default, can go to 7 days
Since data is not removed as soon as it is consumed (like SQS), data reprocessing is possible
Multi apps can consume the same stream
Data added to Kinesis is immutable, it cannot be deleted, it expires after given time
Customer manages scaling via shard splitting/merging
Shards

One stream is made of many diff shards
1 MB/s or 1000 messages/s at write PER SHARD
2MB/s read PER SHARD
Billing is per shared provisioned, can have as many as you want
Batch processing or per message processing is allowed
Number of shards can change over time (reshard/merge)
Records are ordered per shard


API Put Records

PutRecord API * Partition key gets hashed
Same key goes to same parition, helps with ordering for specific key
Messages always get a sequence number which is always increasing
Choose a partition key that is highly distributed, otherwise data will go to one shard aka "hot partition"

ie. user_id if many users. since we have many users and thus many PUTs, this data will spread across shards
country_id is BAD if 90% of your users are in one country. This will result in most data going to a specific shard


Batching with PutRecords to reduce costs/increase throughput


API Exceptions

ProvisionedThroughPutExceeded Exceptions

This occurs when data sent exceeds the MB/s or TPS threshold for any shard
Troubleshooting tip: Make sure you don't have a hot shard


Solutions

Retries with backoff
Increase shards (reschard/scaling)
Ensure you're using a good (distributed) partition key


API Consumers

Can use CLI, SDK or producer libraries from various frameworks
Kinesis Client Library (KCL) (various langs)

KCL uses DynamoDB to checkpoint offsets
KCL uses DynamoDB to track other workers and share the work amongst shards


Security

Access can be controlled via IAM
Uses HTTPS endpoints
Encrypt at rest using KMS; client-side encryption also possible
VNC endpoints available for Kinessis to access within VPC


Kinesis Analytics

Consume data from Streams and Firehose using SQL
Auto Scaling
Perform real-time analytics on streams using SQL


Kinesis Firehose

Fully managed service
Does data transformations with Lambda
Automated scaling
No data storage
Load streams into S3, Redshift, ElasticSearch and Splunk
Consume data from Kinesis Producer Library (KPL), Kinesis Agent, Kinesis Data Streams and CloudWatch Logs & Events
Near real time

60 second latency minimum for non full batches
Min of 32Mb of data at a time


Supports many data formats, conversion, transformation and compression
Pay for data that goes through Firehose


Data goes to Kinesis Streams, Kinesis Analytics can then be used to analyze the data, Firehose can take the data
and ship it off to Redshift, S3 Buckets etc.

SQS vs SNS vs Kinesis

SQS

Consumers pull data
Data is deleted after being consumed
Can have as many cusomers as required
No need to configure througput
No order guarantee, except with FIFO queues
Individual message delay capability


SNS

Push data to many subscribers
Up to 10mil subscribers
Data is not persisted, lost if not delivered
Pub/Sub model
Up to 100k topics
No need to configure througput
Integrates with SQS for fan-out arch. pattern


Kinesis

Consumers pull data
As many customers as required, one customer per shard
Possible to replay/reprocess data since it is not automatically purged
Meant for real-time analytics and ETL
Ordering at the shard level
Data expires after X days
Must provision throughput


Amazon MQ

Managed Apache ActiveMQ messaging service
Good for migrating on-prem ApacheMQ systems to AWS
Endpoints: AMQP, STOMP, OpenWire, MQTT, WSS
Doesn't scale as much as SNS/SQS
Runs on a dedicated machine
Supports HA through Active/Standby with each in different AZ
MQ has both queue feature (SQS) and topic features (SNS)

Lambda

Virtual functions
Limited by time, short executions only
Run on demand and scaling is automated
Pricing

Pay per request and compute time
Pay per calls

First 1mil free
$0.20 per million requests thereafter


Pay per duration

400k GB-seconds of compute time per month is free
Equals about 400,000 seconds if function is 1Gb RAM
Or 3.2mil seconds if function is 128Gb RAM
After that $1.00 per 600 GB/seconds


Integrated with many AWS servers
Can monitor with CloudWatch
Functions are tied to custom Roles and Policies
Limits to Know - per region

Execution

Memory allocation: 128 - 3008 (64Mb increments)
Maximum execution time is 900 seconds, 15 mins
Disk capacity in the "function container" is 512Mb
Concurrent executions is 1000, can be increased


Deployment

Function deployment size (a compressed zip) is 50Mb
Size of uncompressed deployment is 250Mb
Can use /tmp diretory to load other files at startup
Size of env variables is 4Kb
Lambda@Edge


Integrate Lambda with CloudFront
Used for deploying a Lambda alongside a CloudFront CDN

Use Lambda to change CF requests and responses

(User) Viewer Request -> CF -> Origin
Origin Request from CF to Origin
Origin Response From Origin to CF
Viewer Response from CF to Viewer


Use Cases

Security/Privacy, analyzing requests and responses
Bot mitigation at the edge
A/B Testing
User authn and authorization
API Gateway


Proxies requests from client to backend service
Supports WebSocket Prtocol
Can support multiple environments, dev, prod etc...
Can manage authentication and authorization
Can create API keys as a means of throttling connections
Caches API responses
Integrations

Lambda - easy way to expose REST API backed by Lambda
HTTP - expose an HTTP service and manage its connections with rate throttling
Expose ANY AWS Service with an API endpoint


Endpoint Types

Edge-Optimized

Requests are routed through CFN Edge Locations
API Gateway still only lives in on region


Regional

No automatic CFN edge location; users all in the same region
Coud manually combine with CFN for more control


Private

API Gateway can only be acccess within your VPC using a VPC interface endpoint (ENI)
Supports using Resource Policies to restrict access


Security

IAM Permissions

Useful for providing User/Role access already in your AWS account
Handles authentication and authorization
Create an IAM Policy authorization and attach to User/Role

API Gateway verifies IAM permissions passed by calling application


Utilizes "Sig v4" capability where IAM credentials (token) are in headers
Flow

REST API calls API gateway.
API Gateway checks IAM Policy
If IAM policy is permissive, API gateway sends query through to backend service


Lambda Authorizer

Supports authentication and authorization via IAM Policies
Uses Lambda to validate the token in header
Caching is possible so request doesn't need to go to Lambda each time
Helps when using OAuth/SAML/3rd third party token
Lambda must return a permissive IAM policy for user/role
Flow

REST API request w/token to API Gateway
API Gateway sends request to Lambda Authorizer
Lambda Authorizer validates IAM Policy and returns policy
If permissive, request is sent to backend


Cognito User Pools

A fully managed user lifecycle
It can be integrated with API Gateway for authentication
Must manage your own user pool (can be federated by many different services)
API Gateway verifies identity automatically with AWS Cognito
No custom implementation is required
Cognito only supports authentication, not authorization
Flow

Client calls Cognito User Pool to authenticate
Client retrieves token and passes it to API Gateway
API Gateway evaluates token with Cognito User Pool
If everything is good, request is sent to backend
Cognito


Provides an identity so users can interact with an application
User Cognito Pools

Sign in functionality for app users
Integrates with API Gateway
Simple login: username or email and password combination
Possible to verify emails/phone numbers and add MFA


Cognito Identity Pools (Federated Identity)

Provides an AWS credential to users so that they interact directly with AWS services
It integrates with User Pools as an identity provider
When goal is to provide direct access to AWS Resources from client side

Log into federated identity provider, or remain anonymous (option)
Get temporary AWS credentials back from Federated Identity Pool
Credentials come w/pre-defined IAM policy stating their permissions
Example: provide temporaryu access to write to S3 bucket from a Facebook login
Flow

App authenticates to Identity Provider (CUP, Facebook, SAML...)
A token is returned
The token is sent to the Federated Identity Pool to verify token with Identity Provider
Once validated, Federated Identity talks to STS to get temp credentials
Temp credentials are then sent back to the client
Client can now access AWS resource (i.e. PUT to S3)


Cognito Sync

Used to sync data from device to Cognito
Stores prefers, configuration and state of app
Cross device sync i.e. from iOS to Android
Requires FIP in Cognito (not User Pool)
To be replaced by AppSync (already?)
AWS Serviceless Application Model


Framework for developing and deploying serverless applications
All configuration is done in YAML

Lambda Functions, DynamoDB tables, API Gateways, CUP...


SAM can help run Lambda, API and DynamoDB locally
SAM an use CodeDeploy to deploy Lambda functions
IoT Core
AWS Serviceless service for injvesting data from many IoT devices