Skip to content

Instantly share code, notes, and snippets.

@nikkaroraa
Created October 2, 2020 09:53
Show Gist options
  • Save nikkaroraa/2f0f4420f238151e85e65cbc688fcae4 to your computer and use it in GitHub Desktop.
Save nikkaroraa/2f0f4420f238151e85e65cbc688fcae4 to your computer and use it in GitHub Desktop.
AWS Monitoring

AWS Monitoring

Why monitoring is important?

  • Once our applications are deployed, our users don't care how we did it
  • Our users only care that the application is working or not
    • Application latency: will it increase over time?
    • Application outages: customer experience should not be degraded

Monitoring in AWS

  • AWS CloudWatch
    • Metrics: Collect and track key metrics
    • Logs: Collect, monitor, analyze and store log files
    • Events: Send notifications when certain events happen in your AWS
    • Alarms: React in real-time to metrics / events
  • AWS X-Ray
    • Troubleshooting application performance and errors
    • Distrubuted tracing of microservices
  • AWS CloudTrail
    • Internal monitoring of API calls being made
    • Audit changes to AWS resources by your users

AWS CloudWatch

AWS CloudWatch Metrics

  • CloudWatch provides metrics for almost all the services in AWS
  • "Metric" is a variable to monitor (CPUUtilization, NetworkIn, ...)
  • Metrics belong to "namespaces"
  • "Dimension" is an attribute of a metric (instance id, environment, etc...)
  • Up to 10 dimensions per metric
  • Metrics have "timestamps"
  • Can create CloudWatch dashboards of metrics

AWS CloudWatch EC2 Detailed monitoring

  • EC2 instance metrics have metrics "every 5 mins"

  • With detailed monitoring (for a cost), you get data "every 1 min"

  • Use detailed monitoring if you want to more prompt scale your ASG!

  • Note: EC2 Memory usage is by default not pushed (must be pushed from inside the instance as a custom metric)

AWS CloudWatch Custom Metrics

  • Possibility to define and send your own custom metrics to CloudWatch
  • Ability to use dimensions (attributes) to segment metrics
    • Instance.id
    • Environment.name
  • Metric resolution
    • Standard: 1 minute
    • High Resolution: up to 1 second (StorageResolution API Paramter) - Higher cost
  • Use API call "PutMetricData"
  • Use exponential back off in case of throttle errors

AWS CloudWatch Alarms

  • Alarms are used to trigger notifications for any metric
  • Alarms can go to Auto Scaling, EC2 Actions, SNS notifications
  • Various options (sampling, %, max, min, etc...)
  • Alarm states:
    • OK
    • INSUFFICIENT_DATA
    • ALARM
  • Period:
    • Length of time in seconds to evaluate the metric
    • High resolution custom metrics: can only choose 10 secs or 30 secs

AWS CloudWatch Logs

  • Applications can send logs to CloudWatch using the SDK
  • CloudWatch can collect logs from:
    • Elastic Beanstalk: collection of logs from application
    • ECS: collection from containers
    • AWS Lambda: collection from function logs
    • VPC Flow Logs: VPC specific logs
    • API Gateway
    • CloudTrail based on filter
    • CloudWatch log agents: for example on EC2 machines
    • Route53: Log DNS queries
  • CloudWatch logs can go to:
    • Batch exporter to S3 for archival
    • Stream to ElastiSearch cluster for further analytics

CloudWatch Logs for EC2

  • By default, no logs from your EC2 machine will go to CloudWatch
  • You need to run a CloudWatch agent on EC2 to push the log files you want

CloudWatch Logs Agent & Unified Agent

  • Both are for virtual servers (EC2 instances, on-premise servers)
  • CloudWatch Logs Agent
    • Old version of the agent
    • Can only send to CloudWatch Logs
  • CloudWatch Unified Agent
    • Collect additional system-level metrics such as RAM, processes, etc
    • Collect logs to send to CloudWatch Logs
    • Centralized configuration using SSM Parameter Store

CloudWatch Logs Metric Filter

  • CloudWatch Logs can use filter expressions
    • For example, find a specific IP inside of a log
    • Or count occurrences of "ERROR" in your logs
    • Metric filter can be used to trigger alarms then
  • Filters do not retroactively filter data. Filters only publish the metric data points for events that happen after the filter was created.

AWS CloudWatch Events

  • Schedule: Cron jobs
  • Event Pattern: Event rules to react to a service doing somehting
    • Example: CodePipeline state changes!
  • Triggers to Lambda functions, SQS/SNS/Kinesis Messages
  • CloudWatch Event creates a small JSON document to give information about the change

Amazon EventBridge

  • EventBridge is the next evolution of CloudWatch Events

  • Default event bus: generated by AWS services (CloudWatch Events)

  • Partner event bus: receive events from SaaS service or applications (Zendesk, DataDog, Segment, Auth0, ...)

  • Custom event buses: for your own applications

  • Event buses can be accessed by other AWS accounts

  • Rules: how to process the events (similar to CloudWatch Events)

Amazon EventBridge Schema Registry

  • EventBridge can analyze events in your bus and infer the schema
  • The Schema Registry allows you to generate code for your application that will know in advance how data is structured in the event bus
  • Schema can be versioned

Amazon EventBridge vs CloudWatch Events

  • Amazon EventBridge builds upon and extends CloudWatch Events

  • It uses the same service API and endpoint, and the same underlying service infrastructure

  • EventBridge allows extension to add event buses for your custom applications and your third-party SaaS apps

  • EventBridge has the Schema Registry capability

  • EventBridge has a different name to mark the new capabilities

  • Over time, the CloudWatch Events name will be replaced with EventBridge

AWS X-Ray

  • Debugging in Production, the good old way:
    • Test locally
    • Add log statements everywhere
    • Re-deploy in production
  • Log formats differ across applications using CloudWatch and analytics is hard
  • Debugging: monolith "easy", distributed services "hard"
  • No common views of your entire architecture!

.....Enter AWS X-Ray.....

AWS X-Ray advantages

  • Troubleshooting performance (bottlenecks)
  • Understand dependencies in a microservice architecture
  • Pinpoint service issues
  • Review request behaviour
  • Find errors and exceptions
  • Are we meeting time SLA?
  • Where am I throttled?
  • Identify users that are impacted

AWS X-Ray leverages "Tracing"

  • Tracing is an end-to-endway to follow a "request"
  • Each component dealing with request adds its own "trace"
  • Tracing is made of segments (+ sub segments)
  • Annotations can be added to traces to provide extra-information
  • Ability to trace:
    • Every request
    • Sample request (as a & for example or rate/min)
  • X-Ray Security
    • IAM for authorization
    • KMS for encryption at rest

How to enable AWS X-Ray?

  • Your code must import the AWS X-Ray SDK
    • Very little modification needed
    • The application SDK will then capture:
      • Calls to AWS services
      • HTTP / HTTPS requests
      • Database calls (MySQL, PostgreSQL, DynamoDB)
      • Queue calls (SQS)
  • Install the X-Ray daemon or enable X-Ray AWS Integration
    • X-Ray daemon works as a low-level UDP packet interceptor (Linux, Windows, Mac)
    • AWS Lambda / other AWS services already run the X-Ray daemon for you
    • Each application must have the IAM rights to write data to X-Ray

AWS X-Ray Troubleshooting

  • If X-Ray is not working on EC2

    • Ensure the EC2 IAM Role has the proper permissions
    • Ensure the EC2 instance is running the X-Ray Daemon
  • To enable on AWS Lambda:

    • Ensure it has an IAM execution role with proper policy (AWSX-RayWriteOnlyAccess)
    • Ensure that X-Ray is imported in the code

X-Ray Instrumentation in your code

  • Instrumentation means the measure of product's performance, diagnose errors, and to write trace information

X-Ray Concepts

  • Segments: Each application / service will send them

  • Sub-segments: If you need more details in your segment

  • Trace: segments collected together to form an end-to-end trace

  • Sampling: decrease the amount of requests sent to X-Ray, reduce cost

  • Annotations: Key-value pairs used to index traces and use with filters

  • Metadata: Key-value pairs, not indexed, not used for searching

  • The X-Ray daemon / agent has a config to send traces cross account

    • make sure the IAM permission are correct - the agent will assume the role
    • This allows to have a central acocunt for all your application tracing

X-Ray Sampling Rules

  • With sampling rules, you control the amount of data that you record

  • You can modify sampling rues without changing your code

  • By default, the X-Ray SDK records the first request "each second", and "five percent" of any additional requests

  • One request per second is the "reservoir", which ensures that at least one trace is recorded each second as long as the service is serving requests

  • Five percent is the "rate", at which additional requests beyond the reservoir size are sampled

AWS CloudTrail

  • Provides governance, compliance and audit for your AWS account
  • CloudTrail is enabled by default
  • Get an history of events / API calls made within your AWS account by:
    • Console
    • SDK
    • CLI
    • AWS Services
  • Can put logs from CloudTrail into CloudWatch Logs
  • If a resource is deleted in AWS, look into CloudTrail first.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment