Skip to content

Instantly share code, notes, and snippets.

@loftwah
Last active May 13, 2024 02:55
Show Gist options
  • Save loftwah/9f1ce564f2e0e20a7409c13540fa3792 to your computer and use it in GitHub Desktop.
Save loftwah/9f1ce564f2e0e20a7409c13540fa3792 to your computer and use it in GitHub Desktop.
RDS scaling

AWS RDS Performance Metrics and Scaling Guide

This guide outlines detailed performance benchmarks, scaling recommendations, and specific alert setup instructions for AWS RDS instances in US, Europe, and Australia regions. Use these guidelines to ensure optimal performance and scalability.

Key Metrics

Latency

  • Good: <= 10 ms - Efficient query performance.
  • Moderate: 10-20 ms - Monitor for potential optimizations.
  • Needs Attention: > 20 ms - Investigate queries and consider index optimizations or instance scaling.

CPU Usage

  • Good: <= 40-50% - Ample headroom for demand spikes.
  • Moderate: 50-70% - Close monitoring required; consider minor scaling or optimization.
  • Needs Attention: > 70-80% - Likely under-provisioned; scaling up or enhancing performance is necessary.

Freeable Memory

  • Thresholds: Less than 20% of total allocated memory is considered low and indicative of memory pressure.
  • Action: If freeable memory frequently drops below this threshold, consider scaling up your instance or optimizing memory-intensive queries.

Disk I/O and Throughput

  • Indicator: Persistent disk IOPS usage over 75% of the provisioned IOPS limit suggests a bottleneck.
  • Example: If your provisioned IOPS is 1000 and you consistently see 750 IOPS or higher, it's time to consider upgrading to higher IOPS or a different disk type.

Connection Counts

  • Normal: Varies significantly, but monitor for spikes that are 50% above the average peak.
  • Example: If your average peak is 200 connections, an unexpected spike to 300 or more should trigger an investigation.

When to Consider Changing Instance Types

  • Performance Plateaus: After optimization efforts, if CPU consistently remains over 80%, or latency remains above 20 ms.
  • Increased Load Consistently: For example, moving from m7g.medium to m7i.large due to consistent CPU usage over 80% and memory pressures.
  • Cost-Effectiveness: Consider switching to r6i instances if they offer better cost-performance ratio, especially if newer instance types provide higher performance at a lower or similar cost.

Monitoring and Alerts

  • CPU Usage Alert: Set an alert for CPU usage over 75% to anticipate scaling needs before hitting critical thresholds.
  • Memory Pressure Alert: Alert when freeable memory goes below 20% of total allocated memory.
  • Latency Alert: Alert for any read/write latencies exceeding 20 ms as this can indicate performance issues needing immediate attention.
  • Connection Spike Alert: Setup an alert for sudden increases in connection counts, e.g., spikes over 50% of the average peak as observed over the past 30 days.

Scaling and Optimization Tips

  • Vertical Scaling: Upgrade to a larger instance like moving from m7g.medium to m7i.large for better CPU and memory capacity.
  • Horizontal Scaling: Implement read replicas to distribute read load and reduce pressure on the primary instance, especially useful if read latency or connection counts are high.
  • Performance Tuning: Regularly review query performance and indexing, especially after significant application updates or growth.

Conclusion

Effective monitoring, timely alerts, and proactive scaling are crucial for maintaining the health and performance of AWS RDS instances. Utilize these detailed guidelines and Datadog's powerful monitoring tools to ensure seamless operation across all regions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment