Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?

Tracking a Ruby memory leak in 2021

We are going to find how you can track a memory leak using the most recent and performant tools. This article's goal is to give an up-to-date and as-simple-as-it-can-be reference on main steps towards tracking a memory leak. If you want to get the most out of it, I've added (IMHO) very useful links all along the article.

If you still want to enjoy the read and do not have a leak, you can create one.

And if your app is still leaking, or if you found another way around your leak, comment below!

1. Use Jemalloc, or MALLOC_ARENA_MAX=2

DISCLAIMER: If you are not using Jemalloc, please use it. Or at least set MALLOC_ARENA_MAX=2. Come back to this article if you still have evergrowing memory.

Another quick note about Jemalloc, sometimes LD_PRELOAD can be overridden, preventing Jemalloc from being used. We had that exact problem using bin/qgtunnel. You can make sure that Jemalloc is running using MALLOC_CONF=stats_print:true ruby -e '1 + 1' | grep jemalloc.

2. Is your app really leaking?

Ok, now let's start talking about real memory leaks. Maybe. If you are reading this article, chances are that you are experiencing OOM issues, or that your app starts swapping at some point and you don't understand why.

The first thing to do in that case should be to ask yourself: what does that memory evolution mean? Question to which I know 3 answers:

  1. My app is too big for my available RAM
  2. I have a memory bloat
  3. I have a memory leak

Fortunately, those 3 answers can be distinguished fairly easily. First, increase your available RAM and then take a look at memory evolution.

NOTE: if you cannot do that in production, or do not want to do that, you can use derailed exec perf:mem_over_time. See below Using derailed benchmarks.

If it keep growing slowly, you can be fairly sure you have a leak. this gives you an indication of how much your app really consumes, and if it is too much, you will have to drop useless gems (and maybe more). If memory spikes suddenly at some point, you have a bloat.

ok vs leak vs bloat

3. Let others find the leak for you

gem install bundler-leak
bundle leak update
bundle leak

That's it, if the last command shows something other than No leaks found, you are lucky here's the leak. Fix it, ship it and let's go! See rubymem/bundler-leak for more information.

Unfortunately, this tool (and maybe others that I don't know about) is not enough for two reasons:

  1. Not all leaks are referenced here (hence if you find one, please advise @memruby).
  2. Sorry to say this, but more often than not, the issue is in our codebase, not the lib we use.

4. Finding the leak by yourself

There are already some articles telling you how to do that. Hence I'll focus on finding the leak locally (not on your production machine) and using existing tools rather than coding them ourselves.

So, once you are sure your issue is a leak, you can start searching for it. This task is harder than searching for a memory bloat since memory slowly grows over time, and you cannot just benchmark before and after a method to see evolution, you wouldn't know if it was just retained because no GC occurred yet, or because it was leaking.

Fortunately, there exists one really great ruby tool: ObjectSpace.dump_all (and a bit more) which lets us analyze our application memory at a given time. And we can take advantage of that to take a few snapshots, and then compare them.

Using derailed benchmarks (QUESTION: Not sure it should be there, or where it should be)

Now is a good time to talk about derailed_benchmarks. If you don't know about it yet, it is a tool for benchmarking rails (and rack) applications performances either for time or memory issues. You may dig through the README for more information, I'll stick with a simple and essential way here:

  1. gem install derailed_benchmarks
  2. RAILS_ENV=production RACK_ENV=production derailed_benchmarks exec <cmd>

Analyzing memory heaps

The idea here is to generate three consecutive dumps (A, B, and C for instance) separated by a few requests to your application, and then compare them to detect what is retained and what is not. I will not detail what is the content of those dumps here, just how to exploit it (here for content).

So what we are going to do is checking which objects are in B, and not in A. Meaning, objects that have been created during requests received between both memory snapshots.

Then we'll keep only objects that are both in B and C. Meaning, objects that were present at B and retained afterward.

For instance, if you have three dumps with the next objects in memory:

A B C
0x01 0x01 0x01
0x02
0x03 0x03

We can say that our object at 0x01 is a long-term object due to our usual memory growth at boot. However 0x02 is ALLOCATED at B, hence if big it could be the cause of a bloat. However, it cannot be the cause of our leak since it has been garbage collected in C. Finally, 0x03 is RETAINED since it appears after some time (B) and stays there (C), it has great chances to be the reason of our leak.

To do that in your code, you just need to derailed exec perf:heap_diff. This will save three dumps for you, and run a three-way diff against those. You can then just check which retained line concern your application.

$ TEST_COUNT=2000 RACK_ENV=production PATH_TO_HIT=/v1/status derailed exec perf:heap_diff
$ heapy diff tmp/2021-05-06T11:15:43+02:00-heap-0.ndjson tmp/2021-05-06T11:15:52+02:00-heap-1.ndjson tmp/2021-05-06T11:16:04+02:00-heap-2.ndjson \
  | grep klaxit-via
Retained DATA 1984 objects of size 158720/13482787 (in bytes) at: /Users/ulysse/Dev/klaxit/klaxit-via/app.rb:56
Retained IMEMO 1982 objects of size 142704/13482787 (in bytes) at: /Users/ulysse/Dev/klaxit/klaxit-via/app.rb:56

If you get a few lines with a huge size, bingo! It is now a matter of understanding why the code is leaking, and fixing that. Otherwise, consider testing another endpoint, changing PATH_TO_HIT. I suggest to always start with a simple endpoint as it will tell you if the leak lies in your whole stack or only in a single endpoint.

5. Wrapping up

It is now time to commit, run derailed exec perf:mem_over_time on the former leaking endpoint, and see that nice plateau after a few requests, indicating that you've solved the leak (congrats!)

References

More resources

@Quiwin

This comment has been minimized.

Copy link

@Quiwin Quiwin commented Jun 29, 2021

Première passe, je relirai à nouveau un peu plus tard.

The first thing to do is that case shall be to ask

=> The first thing to do in that case shall be to ask(should ou shall d'ailleurs?)

There are already three articles telling you how to do that

=> Il faudrait pointer vers tes réferences ici, parce que ce n'est pas très clair

QUESTION: Not sure it should be there, or where it should be)

=> Il me semble bien ici

So what we are going to do is checking which objects are in B, and not in A. Meaning, objects that have been created during requests between both.
Then we'll keep only objects that are both in B and C. Meaning, objects that where present at B and retained afterward.

=> En lisant ça, j'imagine que tu veux dire (B inter C) - (A inter B), il doit y avoir moyen de clarifier le - (A inter B). (peut-être avec un petit schéma d'ensemble?)

@BuonOmo

This comment has been minimized.

Copy link
Owner Author

@BuonOmo BuonOmo commented Jun 29, 2021

En fait c'est même : B & C - A, le reste est pris!

@Quiwin

This comment has been minimized.

Copy link

@Quiwin Quiwin commented Jun 30, 2021

If it grows slowly, ever and ever. It has great chances to be a leak. If it stops growing at some point, you know how much your app really consumes, and if it is too much, you will have to drop useless gems (and maybe more). If at some point memory is suddenly spiking, you have a bloat.

=> Pas sûr de moi la dessus, mais je changerai If it grows slowly, ever and ever. It has great chances to be a leak. par quelque chose comme If it grows slowly, you can be fairly sure you have a leak pour garder la même structure entre les points du paragraphes.

which lets us analyze

=> which let us analyze

To do that in your code, you just need to derailed exec perf:heap_diff. This will save three dumps for you, and run a three-way diff against those. You can then just check which retained line concern your application.
To do so, you just need to derailed exec perf:heap_diff. This will save three dumps for you, and run a three-way diff against those. You can then just check which retained line concern your application.

=> Erreur de copié collé^^, je préfère la seconde version.

Sinon ça me semble bien, il faudrait d'autre paire d'yeux !

@BuonOmo

This comment has been minimized.

Copy link
Owner Author

@BuonOmo BuonOmo commented Jun 30, 2021

image

Pas certain pour le let.

Sinon:

If it keep growing slowly, you can be fairly sure you have a leak

@Quiwin

This comment has been minimized.

Copy link

@Quiwin Quiwin commented Jun 30, 2021

Ah en effet c'est bien lets

@teckwan

This comment has been minimized.

Copy link

@teckwan teckwan commented Jul 6, 2021

or that your app start swapping at some point and you don't understand why.

your app starts swapping

what is that memory evolution?

what does that memory evolution mean?

then take a look at the resulting memory evolution.

If it keep growing slowly, you can be fairly sure you have a leak. If it stops growing at some point, you know how much your app really consumes, and if it is too much, you will have to drop useless gems (and maybe more). If at some point memory is suddenly spiking, you have a bloat.

If it keeps growing slowly, you can be fairly sure to have a leak. If it stops growing at some point, this gives you an indication of how much your app really consumes....

If memory spikes suddenly at some point, you have a bloat.

something else than No leaks found, you are lucky here's the leak

something other than No leaks found, you are lucky, here's the leak.

Sorry to say that

Sorry to say this

there come one really great ruby tool:

there exists one really great ruby tool

The idea here is to generate three consecutive dumps (A, B, and C for instance) separated by a few requests to your application.

.... to your application, and then compare them to detect what is retained and what is not.

Meaning, objects that have been created during requests between both.

... during requests received between both memory snapshots.

Then we'll keep only objects that are both in B and C. Meaning, objects that were present at B and retained afterward.

I suggest you always start with a simple endpoint since it will tell you if the leak lies in your whole stack or only a single endpoint.

I suggest to always start .... endpoint as it will tell you ..... whole stack or only in a ....

Content fairly easy to understand, and straightforward. The suggestions above are based on my gut feeling, to be taken with a grain of salt ;)

@BuonOmo

This comment has been minimized.

Copy link
Owner Author

@BuonOmo BuonOmo commented Jul 6, 2021

@teckwan thanks, done 🙂

@AntoineGirard

This comment has been minimized.

Copy link

@AntoineGirard AntoineGirard commented Jul 7, 2021

Ça me semble plutôt clair et concis !

Je n’ai rien a ajouté !

Je pense que les commentaires des lecteurs qui vont mettre en pratique seront intéressants :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment