BuonOmo/Tracking a ruby memory leak in 2021.md Secret

## Tracking a ruby memory leak in 2021.md

      
    Raw
  

              Tracking a ruby memory leak in 2021.md
            
          
    Tracking a Ruby memory leak in 2021

We are going to find how you can track a memory leak using the most recent and performant tools. This
article's goal is to give an up-to-date and as-simple-as-it-can-be reference on main steps towards
tracking a memory leak. If you want to get the most out of it, I've added (IMHO) very useful links
all along the article.
If you still want to enjoy the read and do not have a leak, you can create one.
And if your app is still leaking, or if you found another way around your leak, comment below!
1. Use Jemalloc, or MALLOC_ARENA_MAX=2

DISCLAIMER: If you are not using Jemalloc, please use it.
Or at least set MALLOC_ARENA_MAX=2. Come back to this article if you still have
evergrowing memory.
Another quick note about Jemalloc, sometimes LD_PRELOAD can be overridden, preventing
Jemalloc from being used. We had that exact problem using bin/qgtunnel.
You can make sure that Jemalloc is running using MALLOC_CONF=stats_print:true ruby -e '1 + 1' | grep jemalloc.
2. Is your app really leaking?

Ok, now let's start talking about real memory leaks. Maybe. If you are reading this article,
chances are that you are experiencing OOM issues, or that your app starts swapping at some
point and you don't understand why.
The first thing to do in that case should be to ask yourself: what does that memory evolution mean? Question to which I know 3 answers:

My app is too big for my available RAM
I have a memory bloat
I have a memory leak

Fortunately, those 3 answers can be distinguished fairly easily. First, increase your available RAM and then take a look at memory evolution.
NOTE: if you cannot do that in production, or do not want to do that, you can use derailed exec perf:mem_over_time. See below Using derailed benchmarks.
If it keep growing slowly, you can be fairly sure you have a leak. this gives you an indication of how much your app really consumes, and if it is too much, you will have to drop useless gems
(and maybe more). If memory spikes suddenly at some point, you have a bloat.

3. Let others find the leak for you

gem install bundler-leak
bundle leak update
bundle leak
That's it, if the last command shows something other than No leaks found, you
are lucky here's the leak. Fix it, ship it and let's go! See rubymem/bundler-leak
for more information.
Unfortunately, this tool (and maybe others that I don't know about) is not enough
for two reasons:

Not all leaks are referenced here (hence if you find one, please advise @memruby).
Sorry to say this, but more often than not, the issue is in our codebase, not the lib we use.

4. Finding the leak by yourself

There are already some articles telling you how to do that. Hence I'll focus on finding the leak
locally (not on your production machine) and using existing tools rather than coding them ourselves.
So, once you are sure your issue is a leak, you can start searching for it. This task
is harder than searching for a memory bloat since memory slowly grows over time, and you cannot
just benchmark before and after a method to see evolution, you wouldn't know if it was just retained
because no GC occurred yet, or because it was leaking.
Fortunately, there exists one really great ruby tool: ObjectSpace.dump_all (and a bit more)
which lets us analyze our application memory at a given time. And we can take advantage of that to take
a few snapshots, and then compare them.
Using derailed benchmarks (QUESTION: Not sure it should be there, or where it should be)

Now is a good time to talk about derailed_benchmarks. If you don't
know about it yet, it is a tool for benchmarking rails (and rack) applications performances
either for time or memory issues. You may dig through the README for more information, I'll
stick with a simple and essential way here:

gem install derailed_benchmarks
RAILS_ENV=production RACK_ENV=production derailed_benchmarks exec <cmd>

Analyzing memory heaps

The idea here is to generate three consecutive dumps (A, B, and C for instance) separated by a few
requests to your application, and then compare them to detect what is retained and what is not. I
will not detail what is the content of those dumps here, just how to exploit it (here for content).
So what we are going to do is checking which objects are in B, and not in A. Meaning, objects that
have been created during requests received between both memory snapshots.
Then we'll keep only objects that are both in B and C. Meaning, objects that were present at B
and retained afterward.
For instance, if you have three dumps with the next objects in memory:


A
B
C


0x01
0x01
0x01


0x02


0x03
0x03


We can say that our object at 0x01 is a long-term object due to our usual memory growth at boot. However
0x02 is ALLOCATED at B, hence if big it could be the cause of a bloat. However, it cannot be the cause
of our leak since it has been garbage collected in C. Finally, 0x03 is RETAINED since it appears after
some time (B) and stays there (C), it has great chances to be the reason of our leak.
To do that in your code, you just need to derailed exec perf:heap_diff. This will save three dumps for you, and
run a three-way diff against those. You can then just check which retained line concern your
application.
$ TEST_COUNT=2000 RACK_ENV=production PATH_TO_HIT=/v1/status derailed exec perf:heap_diff
$ heapy diff tmp/2021-05-06T11:15:43+02:00-heap-0.ndjson tmp/2021-05-06T11:15:52+02:00-heap-1.ndjson tmp/2021-05-06T11:16:04+02:00-heap-2.ndjson \
  | grep klaxit-via
Retained DATA 1984 objects of size 158720/13482787 (in bytes) at: /Users/ulysse/Dev/klaxit/klaxit-via/app.rb:56
Retained IMEMO 1982 objects of size 142704/13482787 (in bytes) at: /Users/ulysse/Dev/klaxit/klaxit-via/app.rb:56
If you get a few lines with a huge size, bingo! It is now a matter of understanding why the code is
leaking, and fixing that. Otherwise, consider testing another endpoint, changing PATH_TO_HIT. I
suggest to always start with a simple endpoint as it will tell you if the leak lies in your whole
stack or only in a single endpoint.
5. Wrapping up

It is now time to commit, run derailed exec perf:mem_over_time on the former leaking endpoint, and
see that nice plateau after a few requests, indicating that you've solved the leak (congrats!)
References


[SLIDES] Why ruby 2.1 excites me? (slide 27-35)
[ARTICLE] Hunting for leaks in ruby
[ARTICLE] Ruby 2.1: objspace.so

More resources


[VIDEO] Visualising application memory
[ARTICLE] Pragmatic steps to find a leak from gems


## Z-article-oom-graph-leak-vs-bloat.png

      
    Raw
  

              Z-article-oom-graph-leak-vs-bloat.png