I wanted to talk about this so bad on the show, but it wasn't released yet. Yesterday we launched performance dynos https://blog.heroku.com/archives/2014/2/3/heroku-xl it basically lets you have a dedicated 6gb of RAM and 8 cores per each dyno. The idea is that if you really want to drop your tail latencies there's no getting around the need for high concurrency. By running on a dyno like this you could easily run 12x the number of Unicorn or Puma workers or more if you're using a Ruby that is copy on write friendly like 2.1.0. You can still scale out horizontally with more "performance" dynos but this is one way you can also scale vertically. Ask me your performance dyno related questions, and I'll do my best to answer them here!
Could you explain the "tail latencies" bit? I saw a bunch of people post stats from their dashboards showing the switch to PX dynos and the difference it made was really impressive, but I guess I don't understand why it made a difference. You mention you cou