Skip to content

Instantly share code, notes, and snippets.

@kylesm
Created November 29, 2020 19:40
Show Gist options
  • Select an option

  • Save kylesm/083199bcfeddd214bf336e7c22332cbb to your computer and use it in GitHub Desktop.

Select an option

Save kylesm/083199bcfeddd214bf336e7c22332cbb to your computer and use it in GitHub Desktop.
AWS re:Invent 2014 session notes

APP306: Using AWS CloudFormation for Deployment and Management at Scale

Video

  • Give developers access to the services they build.
  • Have developers "take the phone" (support their services)
    • what tools do they need to do that?
  • Version the CF templates with the code that uses them
  • Split CF stacks based on stateful and stateless resources
  • Eliminate duplicated template code/resources
  • Can use tools like Troposphere (Python) take it to a higher level
  • They build 2 AMIs: 1 with the OS+RPMs+configuration and 1 with OS+RPMs (no config)
  • Allows them to re-bake the latter image for a different environment without changing the application bits
  • They update the image ID specified in their auto-scaling groups to deploy a new build
  • ASG update policy dictates the resulting behavior
  • They use different AWS accounts to isolate resource limitations (i.e. dev allocations can't impact production).

PFC305: Embracing Failure: Fault-Injection and Service Reliability

Video

  • Ensure timeouts are in sync across the system
  • Need to know the sequence of calls (Dapper/Salp)
  • They use Zuul for traffic shaping and routing
  • Need to consider the combinatorial complexity of the RPCs as well as availability. If you want 4 nines of availability from a service, each of its upstream services needs >4 nines (S1 x S2 x S3)

DEV302: Tips, Tricks, and Best Practices for the AWS SDK for Java

Video

  • Can have multiple sets of credentials in ~/.aws/credentials:

    [default]
    ...
    [production]
    ...
    
  • Use ProfileCredentialsProvider and specify the profile name (i.e. 'production') when calling the constructor

  • BP: in production, use the InstanceProfileCredentialsProvider and use an IAM role assigned to the EC2 instance

  • Can use a provider credentials chain; if you specify the right chain you won't need to change code in production (i.e. ProfileCredentialsProvider first, and then InstanceProfileCredentialsProvider second)

  • Can enable client-side metrics, which appear to be reported to CloudWatch

  • Look for AwsSdkMetrics class methods or enable it via JMX or a system property

  • Take a look at the new "resource objects" support that's currently in developer preview on GitHub. It will help reduce the amount of boilerplate code.

SDD419: Amazon EC2 Networking Deep Dive and Best Practices

Video

  • Placement group: lower round-trip time (RTT)
  • Enhanced networking (SR-IOV): c4, c3, r3, i2 types support it
  • Use i2 types for MongoDB
  • Specify the 'cluster' strategy for a placement group, it will place instances physically close to one another
  • Only certain instance types are allowed in placement groups (basically same set as enhanced networking)
  • Placement groups are local to an availability zone
  • BP: only add instances to a placement group when it's initially created and add all members at one time. Will fail quite often if you attempt to add instances long after the PG created (as physical space around the existing members may be limited/non-existent).
    • PGs not suitable for horizontally scalable tiers because of this
  • BP: homogeneous instance types
  • To check if SR-IOV is enabled run ethtool and check the driver type
    • vif: no, ixgbevf: yes
  • Can use ec2-describe-instance-attribute with sriovNetSupport as the attribute to see if EC2 thinks it's supported (reference).
  • You can modify the attribute once if you manually add SR-IOV support to a supported instance type
  • Cannot go back once you convert/enable an image!
  • Always do the instance half (i.e. driver install/setup) first, lest you lose network access to the guest after enabling
  • If you register a custom AMI with SR-IOV support all instances created from it will automatically have it enabled

PFC304: Effective Interprocess Communications in the Cloud: The Pros and Cons of Microservices Architectures

Video | Slides

  • The tipping point: organizational growth (multiple teams) + diverse functionality + bottleneck in monolithic stack
  • Need structure when adopting microservices, lest chaos ensue
  • Polyglot is okay, but ensure there are standards for how things work/are operated

SPOT302: Under the Covers of AWS: Core Distributed Systems Primitives That Power Our Platform

Video

  • S3 uses gossip to do discovery
  • Gossip protocols are not consistent: members will each have a different view based on the gossip they've heard
  • An alternative is to use a metadata store/consensus (e.g. ZooKeeper)
  • BP: build an API for the metadata store so that the internal structure can evolve w/o breaking clients
  • Failure detection: there's no way to determine if someone is dead or just silent
  • You can detect liveness
  • Don't let components go silent; have them report heartbeats at a minimum
  • Use leases instead of locks
  • They implemented a workflow model that is very much like Quartz: stateless actions that query the metadata store
    • Idempotent actions and workflows are the key
  • Favorite interview questions:
    • How to achieve consensus
    • Biggest challenge of distributed systems: partial failures
  • Three types of distributed systems: those with SPoFs, those with Paxos at the bottom, and broken distributed systems
  • First attempted Paxos as a library, limited adoption
  • Then implemented Paxos as a service, which boosted adoption
  • 3rd attempt: Paxos as a primitive: transaction journal (since folks want order + consistency typically)

SEC306: Turn on CloudTrail: Log API Activity in Your AWS Account

Video

  • Can write logs for multiple accounts or regions to a single S3 bucket
  • CloudTrail Processing Library in Java does all the CT work for you, you just write the business logic to react to events

APP304: AWS CloudFormation Best Practices

  • Organize CF stacks by layers and environment
    • e.g.: identity, base networking, shared, backend, frontend
  • Use input/output parameters to express dependencies
  • Use nested stacks for reusability
  • CF now lets you strongly type parameters (!!!)
  • Can signal completion back to CF if using user data (!!!)
  • Can use CloudWatch logs to stream logs out of an instance
  • Flow updates through CF only
  • Sounds like there's a WIP preview feature (!!!)
  • Can use CloudFormer to dump a CF template snippet for a resource to compare drift
  • Use ASGs to do rolling updates
  • Can extend CF with stack events (!!!)
  • Have CF send notifications to custom extensions using SNS
  • Custom resources must understand create, update, rollback, delete events
  • Custom extension signals CF when done
  • Use 'noecho' option to not log sensitive info to CF (!!!)
  • AWS Cost Explorer can slice things by tags
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment