Skip to content

Instantly share code, notes, and snippets.

@chriseckhardt
Last active September 12, 2016 04:46
Show Gist options
  • Save chriseckhardt/c82c8e55e403474f2d8db30a264c8d2e to your computer and use it in GitHub Desktop.
Save chriseckhardt/c82c8e55e403474f2d8db30a264c8d2e to your computer and use it in GitHub Desktop.

Netflix JavaScript Talks - Debugging Node.js in Production

Notes from https://www.youtube.com/watch?v=O1YP8QP9gLA

"Let's work the problem, people. Let's not make things any worse by guessing."

Apply the Scientific Method

  1. Construct Hypothesis of what's happening
  2. Collect data
  3. Analyze data and draw a conclusion
  4. Repeat

Three Types of Problems

  • runtime perf
  • runtime crashes
  • memory leaks

Runtime Performance

http://restify.com

It tells Netflix individual timers about individual middleware.

req.timers: {
  "parseBody": 700123, # on cpu
  "apiRPC": 301911,
  "render": 400031     # also on cpu
}

Node is essentially single-threaded. CPU is crucial.

Even one bad actor on CPU can cause cascading response on everything.

"We think this is spending time on the CPU. If we think it's code, let's analyse the code."

Lots of code...

Statistically Sample Stack Traces

Snapshot what's currently executing.

A Stack Trace is a report of the active stack frames at a certain point in time during the execution of a program.

We want to sample every stack trace in our production trace.

Two problems with sampling in production:

  1. How do you sample stack traces a running process?
  2. How do you do that without affecting the performance of the process?
  • Linux Perf Events (perf(1))
perf record -F 99 -p $(pgrep -n node) -g -- sleep 30

We want to gather all the data, but we miss the javascript stackframes?

v8 places the symbols just in time (JIT)

In order to solve this problem, v8 has a node --perf_basic_prof_only_function which is an argument that outputs a map file that translates over a period of time native memory address to javascript files and their line numbers. Completely safe, negligible on production performance, available in NodeJS 4+.

3c793e446880 22c LazyCompile:-baseCallback /apps/node/webapp/node_modules/restify-errors/node_modules/lodash/index.js:1654

Now we want to analyze them. But too many traces!

We want to visualize how each of those stack traces show up in a sample.

Flame Graphs

Lets you find 1 LOC out of 6million.

We want to first tackle stack frames that spend a lot of time on CPU (widest).

We want to start in userland javascript (code we've written).

It's generally something we've done ourselves, rather than the framework or runtime.

  1. sample via perf
  2. visualize with flame graphs
  3. identify candidate code paths for performance improvement
  4. Repeat

Runtime Crashes

What is a Core Dump?

Magnetic core memory used tiny magenetic toruses to store 1Mb (1024Mb of data) back in the day when engineers wanted to debug their process they would dump the core memory to paper.

Production Constraints

  • uptime critical
  • not easily reproducible
  • can't simulate prod environment
  • resume normal ops ASAP
                 restart app -> continue serving traffic
take core dump < 
                 load core dump elsewhere -> debug -> engineer fix

can configure node to dump core to disk on exception

node --abort_on_uncaught_exception throw.js

Node Post Mortem Tooling

Linux

Solaris

Set up a debug solaris instance

Where: Inspect the stacktrace

Why: inspect the heap and stack variable state

mdb(1) JS commands

::help <cmd>
::jsstack
::jsprint
::jssource
::jsconstructor
::findjsobjects
::jsfunctions
mdb ./node-v4.2.2-linux/node-v4.2.2.2-linux.x64/bin/node ./core.7186
::jstack -v

ALWAYS NAME YOUR JavaScript FUNCTIONS!!

core dumps give you complete process state of heap objects

Memory Leaks

Purify: Fast Detection of Memory Leaks and Access Errors by Reed Hastings and Bob Joyce

gcore -- completely prod-safe way to generate a core file for a running process.

::findjsobjects

part of the mdb tools.

If objects are inconspicuous, take successive core dumps and compare their objects.

How to look at deltas between successive core dumps.

Memory Leak Strategy

  • look at objects on heap for suspicious objects
  • Take successive core dumps and compare object counts
  • Growing object counts are likely leaking
  • Inspect object for more context
  • walk reverse references to find root object
::findjsobjects

number of objects should increase if there's a memory leak.

everytime you include a module in nodejs, it caches a metadata object.

8fla04d39::findjsobjects | ::jsprint ! grep filename | sort | uniq -c

What's holding onto these modules?

  • find the object leaking
  • find the root holding onto the object
  • find what is actually leaking

Walk Reverse References with ::findjsobjects -r

::findjsobjects -r

root cause node caches metadata for each module, if exception that gets leaked.

In terms of methodology, tracking down memory leaks

  • Take successive core dumps (gcore(1))
  • Compare object counts (::findjsobjects)
  • Growing objects are likely leaking
  • Inspect object for more context (::jsprint)
  • Walk reverse references to find root object (::findjsobjects -r)

More State than Just Logs

  • Detailed stack trace ::jsstack
  • Function args for each frame ::jsstack -vn0
  • Get state of any object and its provenance ::jsprint, ::jsconstructor
  • Get source code of any function ::jssource
  • Find arbitrary JS objects ::findjsobjects
  • Use an unmodified Node binary!

Join the Post Mortem Working Group

Make mdb_v8 cross platform https://github.com/joyent/mdb_v8

Contribute to https://github.com/tjfontaine/lldb-v8 and https://github.com/indutny/llnode

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment