Skip to content

Instantly share code, notes, and snippets.

@danbahrami
Last active March 29, 2019 14:15
Show Gist options
  • Save danbahrami/660677df755ddac3d982c017ae897278 to your computer and use it in GitHub Desktop.
Save danbahrami/660677df755ddac3d982c017ae897278 to your computer and use it in GitHub Desktop.

A case study in debugging a memory leak

Our CS team have been getting reports from customers that Geckoboard was crashing the browser window on their TVs after displaying a dashboard loop for a long period of time. I spent a week looking at the issue and found we were leaking memory in at least four places. I figured that sharing my process might help in the future when we inevitably find more leaks.

Finding a pattern

I knew this was an ongoing issue and that we had investigated memory leaks in dashboard loops before. The first thing I did was to talk to some of our engineers who had worked on the problem previously. I spoke to Mike H and Klara who sent me some Clubhouse cards with detailed user reports.

From what I could tell the leak was happening for:

  • Dashboard loops only, not single dashboards
  • Using both sharing URLs and our Send to TV feature
  • Full Fat Geckoboard, not Geckoboard Lite
  • Chromebits, Chrome browsers on laptops and several other devices
  • All widgets, no specific types of widgets that we could identify

Recreating the conditions

I wanted to see if I could reproduce the leak on my computer. That meant re-creating the conditions that our customers had. I found the simplest loop that exhibited a memory leak from the customer reports. It was easy to see the widgets that each dashboard had but to check the kind of layouts they had (fixieboards or classic) I had to open up the element inspector in Chrome DevTools. Classic dashboards wrap the dashboard in a div with id #dashboard-wrapper, fixieboards do not. The loop contained:

1 classic dashboard with: 3 column charts, 3 leaderboards

1 fixieboard with: 5 number widgets with comparison sparklines, 1 leaderboard, 2 static text widgets, 1 paginated text widget with 3 panels

I created 2 dashboards that looked exactly like the customer's dashboards and put them into a loop. I suspected that the leak happened when switching between dashboards in a loop so in my database I manually set the loop time to 5 seconds. That would give each dashboard just enough time to render the widgets before it was switched out. That way I could reproduce the leak without having to wait 30 seconds or more each time.

Performance testing

In a new Chrome window I went to the sharing URL of the dashboard loop I just created and opened Chrome devtools. The performance tab allows you to record certain metrics over time including the size of the JavaScript heap, number of DOM nodes and number of active event listeners.

With the loop running I recorded a 60 second profile.

image

What you're seeing is the JS Heap (blue line), number of DOM nodes in memory (green line) and number of active event listeners (orange line). What we would like to see is a flat horizontal line with a trough every 5 seconds where the DOM nodes are unmounted from one dashboard and then mounted again for the next. But the profile suggests that when the dashboards switch the new DOM nodes are added without the old DOM nodes being removed, leading to a compounding total. This is a classic indicator of a memory leak.

Memory snapshots

It's important to note that the green line represents the number of DOM nodes still in memory and not the number of DOM nodes mounted on the document. If a DOM node isn't in the document you won't be able to see it in the DevTools element tab so how can you find it? That's where memory snapshots can help.

Performance profiles help to see performance trends over time at a high level but memory snapshots give you a really high fidelity view of what's in memory at a single point in time.

With the loop still running I took a snapshot every time the loop completed one full cycle.

image

The first thing to notice in the left sidebar is that each snapshot is over 1MB larger than the previous. Another good indicator of a memory leak. In the main window you can see all of the JS objects that are currently in memory (grouped by type) with a count of the objects in each group. It even groups "detached" DOM nodes separately. When I searched for "detached" in all of the snapshots I noticed that the count increased at a steady rate between every snapshot. This was confirmation that, when the loop moves from one dashboard to another, at least some of the DOM nodes on that dashboard were being retained in memory.

The next task was to find what these DOM nodes were and why they weren't being garbage collected.

Inspecting objects in memory

When you select an object you can see more detail about it in the "Object" panel, including a few clues to the definition and source of the object. By scrolling through individual detached DOM nodes the first thing I noticed was that a large amount contained references to interact.js.

image

Interact.js is the third party library we for drag and drop on classic dashboards. A quick Google of "interact.js memory leak" lead me to a GitHub issue on the interact.js repo, dated Nov 2015.

image

That felt pretty relevant, we are using interact.js inside an iFrame. I checked the interact.js version we were using and it was older than this issue (which has since been fixed). I removed the Backbone view that uses interact.js, reloaded the loop and took another 60 second performance profile.

image

As you can see, the loop is no longer showing the same steadily increasing memory profile. It appears that, by removing interact.js we have fixed a memory leak!

Unforunately, when running the profile for a longer period of time the memory still trended upwards. It was at a much more shallow rate than before but we were still leaking memory.

Chipping away

I kept scrolling through all the detached DOM nodes in the memory snapshots. A lot of them didn't contain very useful hints of their origin but occasionally I'd hit one I recognised. For example, something in visualisation-wrapper-component.js was being retained in memory.

image

And a dashboardFooterElement.

image

Each time I found a detached DOM node that I recognised I tugged at the thread, trying to find in the code where that element gets created. As it turns out both of these elements are instantiated in Backbone views.

The manual lifecycle of a Backbone view

One of the great things we take for granted in React is that when a component is unmounted, all of its children are automatically unmounted as well. That's because we instantiate components as a tree structure in JSX.

<Dashboard>
  <Widget {...widget1} />
  <Widget {...widget2} />
  <Widget {...widget3} />
</Dashboard>

But in Backbone we tend to instantiate all of a views children and then manually build up a structure.

this.widget1View = new WidgeView(widget1);
this.widget1View.mount();

this.el.append(this.wrapperView.el);

This method means its down to us to manually unmount child views when the parent is unmounted. Unfortunately this can be quite easy to miss. For example, in the Backbone dashboard view we were compiling a list of widget views and never calling view.remove() on them when the dashboard view is unmounted.

Wrapping up

After fixing the four most obvious leaks I took a final performance profile.

image

The results show a much more promising cycle of memory increase and garbage collection. However, on a long enough timeline the memory still trends upwards. My investigation showed that there wasn't a single leak, the memory increase we are seeing is probably the result of a lot of small leaks. The most common causes are likely to be third party libraries and bad view maintenance in Backbone.

The memory leaks I found were visible on this one loop that I reproduced. We now have data from our endurance tests to suggest that different loops have different memory profiles, suggesting that there are a lot more leaks to find. Going forward we're unlikely to find a silver bullet that will stop all memory leaks so I suggest we view it as an ongoing maintenance project. Every cycle we should be dedicating a small amount of time to memory leak investigation.

Thanks for reading.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment