schneems/gist:447cc1f2b93c68ae51bb

## gistfile1.md

      
    Raw
  

              gistfile1.md
            
          
    Routing behavior regarding backlog.
Puma and unicorn both have a "backlog". This is really a fancy way of saying that the socket they're listening at can queue messages faster than they're processing them. Backlog is a good thing, otherwise without it we could not "burst" i.e. a request could only be handled if there was capacity in the existing web server to process it at the moment.
We want to avoid the "rap genius" queuing scenario of load. This is where we have a really fast request queued behind a really slow one. The best way to avoid this is to have a concurrent server (unicorn/puma). In the most famous case, a single threaded non concurrent webs erver was used. When a concurrent server is used...now if you have 4 workers that can process requests you would need to have 4 slow requests in a row before the one fast one to have the same effect. As routing is randomized the probability of this approaches 0 with the more concurrency you have on each dyno.
Coming back to backlog, it matters the load on the app and the capacity of the app. Here are what I percieve as being the three states of load:
Under load

Your app is getting less requests than your app has capacity. Lets say your app can handle 100 requests per second (made up number) and you are only getting 50 requests per second. This means for each request your app works on it right away. It also means you have too much capacity and are paying us too much money.
If you decrease backlog in this scenario it will help as you have ready dynos waiting for requests and you have some dynos that are stuck with long running reqests, however using any kind of a concurrent server means you have to have multiple long running requests to the same dyno repeatedly. This chance goes down dramatically as the concurrency of your app goes up.
Over loaded (burst)

Your app is getting more requests than it can handle on a temporary basis. Your app can handle 100 requests per second, and normally you only get 50. But sometimes you get 150 for brief periods. This is an ideal case as you're not over provisioned (paying too much) but you should be able to handle the load.
If you decrease backlog here, your requests under the normal scenario will be faster, but when you need to burst all your dynos will be busy. A request will be retried across a dyno, but it is busy, than retried across another dyno, and repeat until it returns undeliverable. Your backlog is what allows you to queue requests above and beyond what your server is able to handle on a regular basis.
Over loaded (constantly)

Here you need to scale up, period. You can handle 100 requests, and regularly receive 150 and burst up to 200.
Adjusting backlog here does nothing basically, the only thing you can do is provision until you are under load or operating just slightly at capacity with some burst capabilities.
Considerations

Backlog is a really really big number by default, typically somewhere around 1024 requests can be queued and this is per dyno. This is intentionally large as if you're overflowing this value, you really need more capacity. On heroku rejected requests will be retried https://devcenter.heroku.com/articles/http-routing#dyno-connection-behavior (so it seems from the docs). It seems like decreasing backlog would improve performance, but as we've seen from the above scenarios it doesn't every really buy us anything.
As always each app is different, and YMMV. If you find a good candidate for decreasing backlog please let me know. If you're running an insanely low latency service where speed matters way more than cost, it might make sense to have 2 or 3x the capacity you need and keep backlog really low, like essentially 0 (though apparently the lowest the OS will let it go is 16). Other than that I can't imagine a scenario where it would be benneficial.