Zach Swanson zswanson

## 01_concurrent_requests.md

      
        
          
            
              
              1 file
            
          
          
            
              
              0 forks
            
          
          
            
              
              0 comments
            
          
          
            
              
              1 star
            
          
        
        
          
              
          
          
            
                matthiasr
                / 01_concurrent_requests.md
            
            
              Created
              May 24, 2022 14:18
            
              
                On autoscaling
              
          
        
      
        
  
      
    Kubernetes defaults to scaling by CPU usage, because that is what is always available.
However, this is not a great metric to scale on.
Most backend services are not CPU-bound – they mostly wait for responses from other services, caches, or databases.
If one of those gets slow, or worse, talking to one of those gets slow, CPU-based scaling will tear down resources rather than scaling out, because all it sees is "idle" instances.
This is especially bad if the contended resource is concurrency on those network requests.
If many requests are waiting to check out a connection from the database connection pool, scaling down is what you want the least.
My favorit metric to scale on is how many requests you have currently ongoing (per instance).
There's a relationship between latency, request rate, and this number of ongoing requests, that means that say a 20% increase in latency at the same request rate, or a 20% increase in requests at the same latency, result i