Last updated on Jan 15, 2019
This article assumes that you are running Ruby 2.5 or higher, Rails 5.0 or higher and Ubuntu 18.04 or higher.
Adding Puma to your application
Puma Config
Workers
Preload app
Rackup
Port
Environment
Timeout
Slow clients
Database connections
Backlog
Thread safety
Process count value
Thread count value
Conclusion
Puma is a webserver that competes with Unicorn and allows you to handle concurrent requests. It uses threads, in addition to worker processes, to make more use of available CPU. You can only utilize threads in Puma if your entire code-base is thread safe (which will be discussed later in this article). Otherwise, you can still use Puma, but must only scale-out through worker processes.
This article will walk you through deploying a new Rails application to Amazon Web Services (AWS) using the Puma web server. Always test your new deployments in a staging environment before you deploy to your production environment.
-
First, add Puma to your app’s
Gemfile
:gem 'puma'
-
If you're using a
Procfile
, set Puma as the server for your web process in theProcfile
of your application. You can set most values inline:web: bundle exec puma -t 5:5 -p ${PORT:-3000} -e ${RACK_ENV:-production}
However, we recommend generating a config file, if you do not have on yet. Check your
my-app/config
directory for thepuma.rb
config file. If none exists, then generate one by running the command below within your application:web: bundle exec puma -C config/puma.rb
Make sure the
Procfile
is appropriately capitalized and checked into git.
Create a configuration file for Puma at config/puma.rb
if none already exists or at a path of your choosing. For a simple Rails application, the following basic configuration is recommended:
threads_count = ENV.fetch("RAILS_MAX_THREADS") { 5 }
threads threads_count, threads_count
port ENV.fetch("PORT") { 3000 }
environment ENV.fetch("RAILS_ENV") { "production" }
workers ENV.fetch("WEB_CONCURRENCY") { 2 }
preload_app!
plugin :tmp_restart
You must also ensure that your Rails application has enough database connections available in the pool for all threads and workers. (This will be covered later).
We will gradually explain each of these configuration settings:
-
threads_count = ENV.fetch("RAILS_MAX_THREADS") { 5 } threads threads_count, threads_count
Puma can serve each request in a thread from an internal thread pool. This behavior allows Puma to provide additional concurrency for your web application. Loosely speaking, workers consume more RAM and threads consume more CPU, and both offer more concurrency.
On MRI, there is a Global Interpreter Lock (GIL) that ensures only one thread can run at any time. IO operations such as database calls, interacting with the file system, or making external http calls will not lock the GIL. Most Rails applications heavily use IO, so adding additional threads will allow Puma to process multiple threads, gaining you more throughput. JRuby and Rubinius also benefit from using Puma. These Ruby implementations do not have a GIL and will run all threads in parallel regardless of what is happening in them.
Puma allows you to configure your thread pool with a
min
andmax
setting, controlling the number of threads each Puma instance uses. The min threads setting allows your application to spin down resources when not under load. This feature is not needed on Heroku as your application can consume all of the resources on a given dyno. The recommended setting ismin
to equalmax
.Each Puma worker will be able to spawn up to the maximum number of threads you specify.
-
port ENV.fetch("PORT") { 3000 }
Heroku will set ENV['PORT'] when the web process boots up. Locally, default this to 3000 to match the Rails default.
-
environment ENV.fetch("RAILS_ENV") { "production" }
Set the environment of Puma. If you are in production then it should be set to 'production', else you can set it to
development
ortest
based on the environment that you are working on. -
workers ENV.fetch("WEB_CONCURRENCY") { 2 }
Puma forks multiple OS processes within each dyno to allow a Rails app to support multiple concurrent requests. In Puma terminology, these are referred to as worker processes (not to be confused with Heroku worker processes which run in their dynos). Worker processes are isolated from one another at the OS level, therefore not needing to be thread-safe.
Each worker process used consumes additional memory. This behavior limits how many processes you can run in a single dyno. With a typical Rails memory footprint, you can expect to run 2-4 Puma worker processes on a
free
,hobby
orstandard-1x
dyno. Your application may allow for more or less, depending on your specific memory footprint. We recommend specifying this number in a config var to allow for faster application tuning.Note: Multi-process mode does not work if you are using JRuby or Windows because the JVM and Windows do not support processes. Omit this line from your config if you are using JRuby or Windows.
-
preload_app!
Preloading your application reduces the startup time of individual Puma worker processes and allows you to manage the external connections of each worker using the
on_worker_boot
calls. -
There is no request timeout mechanism inside of Puma. The Heroku router will timeout all requests that exceed 30 seconds. Although an error will be returned to the client, Puma will continue to work on the request as there is no way for the router to notify Puma that the request terminated early. To avoid clogging your processing ability, it is recommended to use
Rack::Timeout
to terminate long-running requests and locate their source.Add the Rack Timeout gem to your project then set your timeout value via an environment variable
RACK_TIMEOUT_SERVICE_TIMEOUT=20
Now any requests that continue for 20 seconds will be terminated and a stack trace output to your logs. The stack trace should help you determine what part of your application is causing the timeout so you can fix it.s
-
A slow client is one that sends and receives data slowly. For example, an app that receives images uploaded by users from mobile phones that are not on WiFi, 4G, or other fast networks. This type of connection can cause a denial of service for some servers, such as Unicorn, as workers must sit idle as they wait for the request to finish. To protect your application either move to a server with built-in slow client protection, such as Puma or run behind a proxy server such as NGINX that handles slow clients. The Unicorn web server must run behind NGINX, or it is vulnerable to slow client attacks.
Puma can allow multiple slow clients to connect without requiring a worker to be blocked on the request transaction. Because of this, Puma handles slow clients gracefully. Heroku recommends Puma for use in scenarios where you expect slow clients.
-
As you add more concurrency to your application, it will need more connections to your database. A good formula for determining the number of connections each application will require is to multiply the
RAILS_MAX_THREADS
by theWEB_CONCURRENCY
. This combination will determine the number of connections each dyno will consume.Rails maintains its database connection pool, with a new pool created for each worker process. Threads within a worker will operate on the same pool. Make sure there are enough connections inside of your Rails database connection pool so that
RAILS_MAX_THREADS
number of connections can be used. If you see this error:ActiveRecord::ConnectionTimeoutError - could not obtain a database connection within 5 seconds
This error is an indication that your Rails connection pool is too low. For an in-depth look at these topics, please read the devcenter article Concurrency and Database Connections.
-
It is possible to set a “backlog” value for Puma. This setting is the number of requests that will be queued at the socket before Puma begins rejecting HTTP requests. The default value is to 1024. We recommend not modifying this value or decreasing it. It may seem like a good idea to reduce this value, so when a dyno is busy, a request can get sent to a less busy dyno. When Heroku re-routes a bounced request, it assumes your entire app is saturated. Each connection gets delayed by 5 seconds, so you’re automatically being penalized 5 seconds per request. You can read more about routing behavior. In addition, when one of your dynos starts bouncing requests, it’s likely due to an increase in load and all of your dynos will be bouncing requests. Repeatedly bouncing the same request will result in higher error rates for your customers.
An arbitrarily high backlog value allows your dyno to handle a spike in requests. Lowering this value does little to speed up your app, and will actively cause more failed requests for your customers. Heroku recommends NOT setting the backlog value and instead of using the default value.
-
Thread-safe code can run across multiple threads without error. Not all Ruby code is threadsafe, and it can be challenging to determine if your code and all of the libraries you are using can run across multiple threads.
Until Rails 4, there was a thread safe compatibility mode that could be toggled. Though just because Rails is thread-safe, it doesn’t guarantee your code will be.
If you haven’t run your application in a threaded environment such as sidekiq or Puma before, you can first try using Puma and adding
Rack::Lock
middleware which wraps each request in a mutex so that every request is effectively run synchronously.# config/initializers/rack_lock.rb Rails.application.config.middleware.insert_before 0, Rack::Lock
While
Rack::Lock
will ensure that there are no thread-safety issues with your application, the synchronous nature of the middleware does mean that your application will respond slower than if you are using threads, you can still gain concurrency by adding workers. Since a worker runs in a different process and does not share memory, code that is not thread-safe can run across multiple worker processes. However, for maximum efficiency, we recommend being able to run with both processes and threads. -
If you would like to start using a threaded worker or web server such as Puma, how can you know if your app is threadsafe? Unfortunately, there is no accurate litmus test for thread-safety, but there are some common areas you can look for:
-
Ensure Thread-safe Dependencies: Make sure all of your gems are threadsafe, most (if not all) gems that are reasonably popular and have had a release within the past year should be threadsafe.
-
Don’t mutate globals: In general, you want to make sure that you’re not mutating any globally accessible values, for example, if you were using Kernel.const_set in a request, that would affect all requests on all threads and not just the current one. You can get an idea for some other areas that are not thread-safe from this stack overflow answer.
-
Use rack-freeze: This gem prevents you from accidentally mutating your middleware.
Rack freeze
is different thanRack::Lock
and won’t slow down your app. -
Stage and deploy: Once you’re ready to move forward, remove
Rack::Lock
from your project, you can make sure that it’s gone by runningRAILS_ENV=producion rake middleware
First, deploy to a staging app. Increase your thread count above one. We recommend a default thread count of five per worker, but you may want to start with a lower value and work your way up:
Once you have your application running on staging, have several co-workers access the site simultaneously.
You need to monitor exceptions and look for errors such as
**deadlock detected (fatal)
. Concurrency bugs can be challenging to identify and fix, so make sure to test your application thoroughly before deploying to production. If you can make your application thread-safe, the benefit is significant, as scaling out with Puma threads and workers provide more throughput than using workers alone.Once you are confident that your application behaves as expected, you can deploy to production and increase your thread count.
-
-
Increasing process count increases RAM utilization, which can be a limiting factor. Another factor for setting this value is the number of physical cores on the system. Due to the GVL, the Ruby interpreter (MRI) can only run one thread executing Ruby code at a time. Due to this limitation, to fully make use of multiple cores, your application should have a process count that matches the number of physical cores on the system.
If you go above the physical core count, then the processes will contend for limited resources. Due to this contention, they will spend extra time context switching that could have been spent executing code.
You can find the number of physical cores by running
nproc
on a system via terminal. For example:nproc 8
The value returned by
nproc
includes “hyperthreads” in addition to physical cores. To gain the “true” number of physical cores divide by two. For example, with performance-l dynos, there are four physical cores and four hyperthreads. This dyno can only physically execute instructions from four processes at a time.The value for nproc from free, hobby, standard-1x, and standard-2x dynos are correct, but these cores are shared between multiple applications running in containers. While
nproc
for these dynos will all return 8, it is best to assume only one process can execute at a time.While the number of physical cores dictates the maximum number of processes that can execute at a given time, there are cases you want to tune process count above physical core count. Multiple processes can provide redundancy in case one process crashes. When a Puma worker process crashes, it will be restarted, but this process is not instantaneous. While the master process is replacing a worker process, having redundancy can mean that the second process can still process requests. For this reason, we typically recommend a minimum of 2 processes, if possible.
The other reason to have multiple processes that exceed physical core count is if your application is not thread-safe and cannot run multiple threads. If you are only running one process in this scenario, then your core will be idle while your application makes IO calls, such as network requests or database queries. In this scenario, having an extra process would allow it to work on another request while waiting on IO.
The final consideration when setting process type is memory use. Scaling out through processes typically uses more memory than using more threads. For more information on this, see: what is a thread. If your application is using so much memory that it starts to swap to disk, this will dramatically reduce the performance of your application. We highly recommend tuning your process count so that your application does not encounter a
R14 - Memory Quota Exceeded error
. -
Once you’ve found an optimal value for your process count, you can further tune the system’s thread count. Threads in a Ruby (MRI) process allow your app to work on multiple requests at a time when there is IO involved (database calls, network calls, etc.). It is “cheaper” for an operating system to context switch between threads, and they also generally consume less memory overall than processes since they share memory. Adding more threads to a system will increase the overall memory use of the Ruby app over time, but it’s not usually the main concern for tuning the thread number.
We recommend the same number of threads for each process type. We picked five in part because it is the default value for the Active Record connection pool, but also in part because research has shown the value to be “good enough”:
What you’re seeing there, with 5-10 threads being optimal for an I/O-heavy workload, is pretty typical of CRuby. [Appfolio: Threads, Processes, and Fibers](https://engineering.appfolio.com/appfolio-engineering/2019/9/4/benchmark-results-threads-processes-and-fibers)
The open-source codetriage.com project uses Puma, and you can see the Puma config file in the repo.