Dial back instance counts
Discover true steady-state memory usage per instance. Aim for 300MB per.
Avoid major allocations
A good goal would be 300k allocations maximum per controller action. Average allocations per action should be less than 20k. Minimize the number and size of allocations.
- Use an APM. Skylight and Scout are strongest in tracking memory allocation.
- If you don't want to pay for an APM, look at
- Build your own profiling with
- If all else fails, move heavy allocations to Rake tasks.
Audit Gemfile with
bundle exec derailed bundle:mem and you're done! Remember that every gem in your Gemfile is immediately required upon startup, so look for opportunities to "require: false". Sprockets will require most asset gems if it needs them, you don't have to require them. This will reduce production memory usage because asset gems aren't needed (since you precompiled your assets).
gem 'sass', require: false
It's just a better malloc with better fragmentation avoidance. I prefer to compile
Ruby with jemalloc, but you can also use the
LD_PRELOAD environment variable.
brew install jemalloc LD_PRELOAD=/usr/local/Cellar/jemalloc/4.2.0/lib/libjemalloc.dylib ruby myscript.rb
./configure --with-jemalloc make make install
Use a forking webserver
Puma, Unicorn, Passenger all work. Be sure to use whatever "preload" options are available. Copy-on-write increases shared memory usage, which decreases overall memory usage. Remember that you may not see any improvement in RSS, because shared memory is sometimes included in how tools display the resident set.
Use a threaded webserver
Threads are lighter memory-wise than processes. Many webapps can benefit from threads, especially those that have lots of database I/O or interact with external webservices. Puma and Passenger Enterprise are threaded webservers.
Keep Ruby and Rails up-to-date
Ruby 2.2 and Rails 4.2 include very important performance improvements. Watch out for Ruby 2.4, which looks like it will include a faster Hash, faster Regex and better control over free slots.
When using a threaded webserver, you may experience a major growth in memory usage.
This is due to malloc's arena implementation, which can get pretty greedy in an
effort to reduce thread contention for memory. If you see huge memory bloat with
threads, try setting the
MALLOC_ARENA_MAX environment variable to a number like 2 or 3.
This will slow down your program slightly, be sure to benchmark.
For more environment variables to tune malloc behavior, see
mallopt() option Env var Default value Notes
M_TRIM_THRESHOLD MALLOC_TRIM_THRESHOLD_ 128KB
M_TOP_PAD MALLOC_TOP_PAD_ 0 M_MMAP_THRESHOLD MALLOC_MMAP_THRESHOLD_ 128KB 0 disables M_MMAP_MAX MALLOC_MMAP_MAX_ 64 0 disables
If you can't read gc.c and understand these variables yourself, don't touch them (yet)
GC Tuning can fix:
- Too many free slots
- Slow startup
- Too many or too few GCs
Be careful. Fix one problem and you may make another worse.