Skip to content

Instantly share code, notes, and snippets.

@nateberkopec
Last active October 30, 2021 02:01
Show Gist options
  • Save nateberkopec/2b1f585046adad9a55e7058c941d3850 to your computer and use it in GitHub Desktop.
Save nateberkopec/2b1f585046adad9a55e7058c941d3850 to your computer and use it in GitHub Desktop.
# Inspired by:
# http://stackoverflow.com/questions/32923610/why-doesnt-this-ruby-program-return-off-heap-memory-to-the-operating-system
require 'objspace'
# Try changing this to any number < 23 to see what happens when all of the
# strings generated can be stored in a single heap slot (one RVALUE).
STRING_SIZE = 23
# p ObjectSpace.memsize_of("a"*1) #=> 40
# p ObjectSpace.memsize_of("a"*23) #=> 40
# p ObjectSpace.memsize_of("a"*24) #=> 65
# p ObjectSpace.memsize_of("a"*250) #=> 291
# Note that the exact values are system-dependent.
# See: http://patshaughnessy.net/2012/1/4/never-create-ruby-strings-longer-than-23-characters
# This is always 408 slots per heap page
SLOTS_PER_HEAP_PAGE = GC::INTERNAL_CONSTANTS[:HEAP_PAGE_OBJ_LIMIT]
# Depends on 32 or 64 bit address space
RVALUE_SIZE = ObjectSpace.memsize_of("a"*1)
def print_stats(msg)
puts '-------------------'
puts msg
puts '-------------------'
puts "RSS: #{`ps -eo rss,pid | grep #{Process.pid} | grep -v grep | awk '{ print $1,"KB";}'`}"
# This is not actually the *size* of the heap in memory, I need to fix this!
puts "HEAP SIZE: #{(GC.stat[:heap_sorted_length] * SLOTS_PER_HEAP_PAGE * RVALUE_SIZE)/1024} KB"
puts "SIZE OF ALL OBJECTS: #{ObjectSpace.memsize_of_all/1024} KB"
end
def run
print_stats('START WORK')
@data=[]
600_000.times do
@data << " " * STRING_SIZE
end
print_stats('END WORK')
@data=nil
end
run
GC.start
print_stats('AFTER FORCED MAJOR GC')
10.times { GC.start }
print_stats('AFTER 11 FORCED MAJOR GC')
puts GC.stat

Dial back instance counts

Discover true steady-state memory usage per instance. Aim for 300MB per.

Avoid major allocations

A good goal would be 300k allocations maximum per controller action. Average allocations per action should be less than 20k. Minimize the number and size of allocations.

  • Use an APM. Skylight and Scout are strongest in tracking memory allocation.
  • If you don't want to pay for an APM, look at memory_profiler, oink.
  • Build your own profiling with ObjectSpace and GC.stat from stdlib.
  • If all else fails, move heavy allocations to Rake tasks.

Audit Gemfile with deraied

bundle exec derailed bundle:mem and you're done! Remember that every gem in your Gemfile is immediately required upon startup, so look for opportunities to "require: false". Sprockets will require most asset gems if it needs them, you don't have to require them. This will reduce production memory usage because asset gems aren't needed (since you precompiled your assets).

gem 'sass', require: false

Use jemalloc

It's just a better malloc with better fragmentation avoidance. I prefer to compile Ruby with jemalloc, but you can also use the LD_PRELOAD environment variable.

brew install jemalloc
LD_PRELOAD=/usr/local/Cellar/jemalloc/4.2.0/lib/libjemalloc.dylib ruby myscript.rb

or

./configure --with-jemalloc
make
make install

Use a forking webserver

Puma, Unicorn, Passenger all work. Be sure to use whatever "preload" options are available. Copy-on-write increases shared memory usage, which decreases overall memory usage. Remember that you may not see any improvement in RSS, because shared memory is sometimes included in how tools display the resident set.

Use a threaded webserver

Threads are lighter memory-wise than processes. Many webapps can benefit from threads, especially those that have lots of database I/O or interact with external webservices. Puma and Passenger Enterprise are threaded webservers.

Keep Ruby and Rails up-to-date

Ruby 2.2 and Rails 4.2 include very important performance improvements. Watch out for Ruby 2.4, which looks like it will include a faster Hash, faster Regex and better control over free slots.

Tune malloc

When using a threaded webserver, you may experience a major growth in memory usage. This is due to malloc's arena implementation, which can get pretty greedy in an effort to reduce thread contention for memory. If you see huge memory bloat with threads, try setting the MALLOC_ARENA_MAX environment variable to a number like 2 or 3. This will slow down your program slightly, be sure to benchmark.

For more environment variables to tune malloc behavior, see mallopt.

mallopt() option Env var Default value Notes M_TRIM_THRESHOLD MALLOC_TRIM_THRESHOLD_ 128KB
M_TOP_PAD MALLOC_TOP_PAD_ 0 M_MMAP_THRESHOLD MALLOC_MMAP_THRESHOLD_ 128KB 0 disables M_MMAP_MAX MALLOC_MMAP_MAX_ 64 0 disables

Tune GC

If you can't read gc.c and understand these variables yourself, don't touch them (yet)

GC Tuning can fix:

  • Too many free slots
  • Slow startup
  • Too many or too few GCs

Be careful. Fix one problem and you may make another worse.

# [fit] Halve Your Memory Usage
# With These
# [fit] 12 Weird
# [fit] Tricks
Heroku and AWS hate him!
@nateberkopec
---
![speedshop](speedshop_s.png)
---
---
![](cats.gif)
---
# Ruby is a garbage collected language.
# But my memory is growing.
# Therefore, memory leak.
---
![fit](pumaleak.png)
---
# Solution 1:
# [fit] Dial Back
# [fit] The Instance Counts
## Discover true steady-state memory usage per instance
---
![autoplay](sinking.mp4)
---
# [fit] Myth:
# Memory usage should look like a long, flat line
---
![fit](flat.jpeg)
---
![fit](log.jpeg)
---
![fit](workerkiller.jpeg)
---
# [fit] Aim for
# [fit] 300MB
# [fit] per instance
## This also applies to Sidekiq
---
# Solution 2:
# [fit] Stop allocating
# [fit] so many
# [fit] objects at once
---
# [fit] Myth:
# "Shouldn't GC clean the unused objects up after the job completes?"
---
# [fit] Translated:
# [fit] Memory goes down,
# [fit] right?
---
![](lazy.gif)
---
# [fit] Thresholds
# [fit] Heap frag
# [fit] free isn't free
---
# [fit] Thresholds,
# [fit] not timers.
---
# [fit] Slots run out
# [fit] oldmalloc
# [fit] malloc
---
# [fit] Heap
# [fit] fragmentation
---
![](ark.gif)
---
![fit](objspace.jpeg)
---
![fit](heapall.jpeg)
---
# [fit] GC::INTERNAL_CONSTANTS
```ruby
{
:RVALUE_SIZE=>40,
:HEAP_PAGE_OBJ_LIMIT=>408,
:HEAP_PAGE_BITMAP_SIZE=>56,
:HEAP_PAGE_BITMAP_PLANES=>4
}
```
---
# [fit] Malloc and free
# [fit] are suggestions,
# [fit] not commands
---
![fit](heapfrag.jpeg)
---
![fit](frenchheap.jpg)
---
# [fit] Heap fragmentation
# [fit] can cause long-term
# [fit] slow "leaks"
---
![](smallleak.jpg)
---
# End result:
# Ruby Memory usage =
# Maximum Memory Pressure
---
![fit](mempress.jpeg)
---
# [fit] Allocating
# [fit] less
Or, fix your god damn N+1s
---
# Solution 2a: Use an APM - Scout, Skylight, New Relic
---
# Solution 2b: Use
## `memory_profiler`
# or `oink`
---
![autoplay](pig.mp4)
---
# oink
```
-- SUMMARY --
Worst Requests:
1. Feb 02 16:26:06, 157524 KB, SportsController#show
2. Feb 02 20:11:54, 134972 KB, DashboardsController#show
3. Feb 02 19:06:13, 131912 KB, DashboardsController#show
4. Feb 02 08:07:46, 115448 KB, GroupsController#show
5. Feb 02 12:19:53, 112924 KB, GroupsController#show
6. Feb 02 13:03:00, 112064 KB, ColorSchemesController#show
7. Feb 02 13:01:59, 109148 KB, SessionsController#create
8. Feb 02 06:11:17, 108456 KB, PublicPagesController#join
9. Feb 02 08:43:06, 94468 KB, CommentsController#create
10. Feb 02 20:49:44, 82340 KB, DashboardsController#show
```
---
# memory_profiler
```
allocated memory by gem
-----------------------------------
rubygems x 305879
allocated memory by file
-----------------------------------
/home/sam/.rbenv/versions/2.1.0-github/lib/ruby/2.1.0/rubygems/core_ext/kernel_require.rb x 285433
/home/sam/.rbenv/versions/2.1.0-github/lib/ruby/2.1.0/rubygems/basic_specification.rb x 18597
/home/sam/.rbenv/versions/2.1.0-github/lib/ruby/2.1.0/rubygems.rb x 2218
/home/sam/.rbenv/versions/2.1.0-github/lib/ruby/2.1.0/rubygems/specification.rb x 1169
/home/sam/.rbenv/versions/2.1.0-github/lib/ruby/2.1.0/rubygems/defaults.rb x 520
/home/sam/.rbenv/versions/2.1.0-github/lib/ruby/2.1.0/rubygems/core_ext/kernel_gem.rb x 80
/home/sam/.rbenv/versions/2.1.0-github/lib/ruby/2.1.0/rubygems/version.rb x 80
```
---
# [fit] Make your own:
# [fit] objectspace &
# [fit] gc.stat
---
# GC.stat - log it!
```ruby
{
:count => 9,
:heap_allocated_pages => 74,
:heap_sorted_length => 75,
:heap_allocatable_pages => 0,
:heap_available_slots => 30164,
:heap_live_slots => 29863,
:heap_free_slots => 301,
:heap_final_slots => 0,
# etc etc
}
```
---
# ObjectSpace.count_objects
```ruby
{
:TOTAL => 30164,
:FREE => 235,
:T_OBJECT => 297,
:T_CLASS => 944,
:T_MODULE => 45,
:T_FLOAT => 4,
# etc etc
}
```
---
# Solution 2c: If all else fails, move to Rake tasks
## Throwaway VMs are better than bloated VMs
---
![autoplay](trash.mp4)
---
# [fit] Solution 3:
# [fit] Gemfile audit
# [fit] with `derailed`
---
```
$ bundle exec derailed bundle:mem
TOP: 54.1836 MiB
mail: 18.9688 MiB
mime/types: 17.4453 MiB
mail/field: 0.4023 MiB
mail/message: 0.3906 MiB
action_view/view_paths: 0.4453 MiB
action_view/base: 0.4336 MiB
```
---
# [fit] Myth:
# [fit] Dependencies
# [fit] are free!
---
# [fit] require false
# [fit] for assets!
```ruby
gem 'sass', require: false
```
---
# sprockets/lib/sprockets/autoload.rb
```ruby
module Sprockets
module Autoload
autoload :Babel, 'sprockets/autoload/babel'
autoload :Closure, 'sprockets/autoload/closure'
autoload :CoffeeScript, 'sprockets/autoload/coffee_script'
autoload :Eco, 'sprockets/autoload/eco'
#etc etc etc
```
---
# [fit] Solution 4:
# [fit] jemalloc
---
> emphasizes fragmentation avoidance and scalable concurrency support.
---
# [fit] LD_PRELOAD
## [fit] **or**
# [fit] --with-jemalloc
---
# [fit] Solution 5:
# [fit] Use copy-on-write
## Puma, Unicorn or Passenger w/preloading
---
# [fit] Copy-on-write
# [fit] increases shared
# [fit] memory
---
![fit](forking.jpeg)
---
# [fit] Myth:
# [fit] Memory usage =
# [fit] sum of RSS
Memory is surprisingly difficult to measure
---
# [fit] memory can be
# virtual/real
# shared/private
# resident/swapped
---
# [fit] It isn't perfect
# [fit] but it's a start
---
# [fit] Solution 6:
# [fit] Use a threaded
# [fit] webserver
Puma, Passenger Enterprise.
Increase concurrency w/lighter methods
---
# [fit] Mini-Myth:
# [fit] My application
# [fit] isn't thread-safe
---
![autoplay loop](dog.mp4)
---
# [fit] minitest/hell
---
# [fit] Solution 7:
# [fit] Keep Ruby
# [fit] and gems up-to-date
---
# [fit] Ruby 2.2+
# [fit] Rails 4.2+
# [fit] Watch out for Ruby 2.4
---
# [fit] Solution 8:
# [fit] Tune malloc
---
# [fit] `MALLOC_ARENA_MAX`
---
# [fit] `mallopt`
---
# [fit] Solution 9:
# [fit] Tune GC
---
# If you can't read gc.c and understand these variables yourself, don't touch them (yet)
---
# GC Tuning can fix:
# Too many free slots
# Slow startup
# Too many or too few GCs
---
# Performance
# Birds of Feather
# Tomorrow @ 1:15pm
---
# 350+ pages, 18+ hours of video
# [fit] railsspeed.com
---
![fit](blog.png)
---
# 10kb
# [fit] speedshop.co
---
# [fit] Thanks!
## Slides and notes on Twitter
## @nateberkopec
## speedshop.co & railsspeed.com
Tell me your problems!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment