Skip to content

Instantly share code, notes, and snippets.

View zswanson's full-sized avatar

Zach Swanson zswanson

  • Wayfair
  • Boston, MA
View GitHub Profile

Kubernetes defaults to scaling by CPU usage, because that is what is always available. However, this is not a great metric to scale on. Most backend services are not CPU-bound – they mostly wait for responses from other services, caches, or databases. If one of those gets slow, or worse, talking to one of those gets slow, CPU-based scaling will tear down resources rather than scaling out, because all it sees is "idle" instances. This is especially bad if the contended resource is concurrency on those network requests. If many requests are waiting to check out a connection from the database connection pool, scaling down is what you want the least.

My favorit metric to scale on is how many requests you have currently ongoing (per instance). There's a relationship between latency, request rate, and this number of ongoing requests, that means that say a 20% increase in latency at the same request rate, or a 20% increase in requests at the same latency, result i