Skip to content

Instantly share code, notes, and snippets.

@dlaehnemann
Last active February 14, 2024 14:14
Show Gist options
  • Star 38 You must be signed in to star a gist
  • Fork 4 You must be signed in to fork a gist
  • Save dlaehnemann/df31787c41bd50c0fe223df07cf6eb89 to your computer and use it in GitHub Desktop.
Save dlaehnemann/df31787c41bd50c0fe223df07cf6eb89 to your computer and use it in GitHub Desktop.
flamegraphing rust binaries' cpu usage with perf

Flamegraphing in Rust can now be done with a new cargo subcommand. Please check this out before embarking on the legacy journey below: https://github.com/flamegraph-rs/flamegraph

flamegraphing rust binaries' cpu usage with perf

One-time setup

  1. Install perf, using Brendan Gregg's guide: http://www.brendangregg.com/perf.html#Prerequisites
  2. Install flamegraph from repo:
    1. Clone the repo locally: git clone https://github.com/brendangregg/FlameGraph
    2. Add the main directory with all the *.pl Perl files to the path:
    cd
    echo "PATH=/path/to/FlameGraph:$PATH" >> .profile
    source .profile
    
  3. If you are running an older version of perf (any Linux kernel version before v4.8-rc1), you should also (this will resolve some further mangled names on top of the c++filt unmangling):
    1. Clone rust-unmangle and add it to your path:
    git clone https://github.com/Yamakaky/rust-unmangle.git
    
    1. Make rust-unmangle executable:
    cd rust-unmangle
    chmod u+x rust-unmangle
    
    1. Add it to your path:
    cd
    echo "PATH=/path/to/rust-unmangle:$PATH" >> .profile
    source .profile
    

compiling and flamegraphing a binary

To turn on debugging information in the binary, to get actual function names in the flamegraph output, temporarily add to Cargo.toml (you should remove this for an actual release):

[profile.release]
debug = true

Then compile with the --release flag, to get cargo to optimize the resulting binary. Otherwise, any slowness may be due to a lack of compiler optimisations:

cargo build --release

Run the cpu sampling with:

perf record --call-graph dwarf,16384 -e cpu-clock -F 997 target/release/name-of-binary <command-line-arguments>

Options here are:

  • --call-graph dwarf,16384: dwarf ensures correct stack dumps, as the standard frame pointers gave me incorrect stacks; the ,16384 doubles the stack dump size from the standard value, which has helped me avoid split stacks (I am assuming the smaller stack size did not suffice for deeper stacks and those stacks were cut off from the bottom, so blocks in the bottom were missing making a correct merging on those lower levels impossible. So try increasing this further in case you get weird split stacks with differences in the amount of lower levels.)
  • -e cpu-clock: this selects the event cpu-clock for perf sampling -- without it, the following argument did not really matter in my environment
  • -F 997: this ensures a sampling at 997 Hz, the value off from a round 1000 is to avoid lockstep sampling (see e.g. Brendan Gregg's blog post from 2014)

Should perf give you errors regarding sysctl settings, you can inspect the current values with, e.g.:

sysctl -n kernel.perf_event_paranoid

And permanently write new values into them with:

sudo sysctl -w kernel.perf_event_paranoid=-1

The resulting report.perf can be rather large, depending on the length of your example run and the sampling frequency selected, easily going into the GBs -- so make sure you have the space available. Based on the report, generate the flamegraph with:

perf script | stackcollapse-perf.pl | stackcollapse-recursive.pl | c++filt | rust-unmangle | flamegraph.pl > flame.svg

Tools here are:

  • stackcollapse-perf.pl: The stackcollapse Perl script by Brendan Gregg, which groups identical levels in stacks together. For installation instructions see above.
  • stackcollapse-recursive.pl: This further collapses some recursive calls, improving readability. @tomtung reported this useful addition.
  • c++filt: This is a C++ demangler / unmangler that takes care of demangling a lot of the Rust name mangling, as Rust also uses C++ name mangling. It should be available in a standard linux installation.
  • rust-unmangle: This script unmangles some remaining names mangled by Rust. It is optional and should not be necessary in versions of perf from Linux kernel v4.8-rc1 onwards, as these should include rust unmangling code (and it worked without rust-unmangle for @tomtung) -- but I haven't tested the newer version myself and needed it for my older one, so who knows who else does, as well.
  • flamegraph.pl: This Perl script by Brendan Gregg takes the collapsed stacks and renders them into the (interactive) .svg format.

Inspect the flame.svg file by opening it in a browser and hovering over individual bars to get the respective function names displayed. You can also search for bars containing certain expressions (top right), click on bars to zoom in on them and reset the view (top left).

Or inspect the non-collapsed report interactively by issuing:

perf report

But really, you want to rather look at the flamegraph... ;)

use SVG files in GitHub issues and comments

It is not possible to directly include .svg files in GitHub issues and comments. However, I found a reasonable work-around:

  1. Create a new GitHub gist while logged in. You'll need to create a new gist for each image, but you can easily drag-and-drop the file in there.
  2. Use the link to the gist in your GitHub comments, advising users to follow that link and then properly inspect it by right-clicking on the preview and selecting View Image (tested in current Firefox).

For an example, have a look at this Pull Request: varlociraptor/varlociraptor#48

Sources

Many thanks to the people behind the following sources upon which I have built this little howto:

@dlaehnemann
Copy link
Author

@tomtung, thanks again for your suggestions and comments over at:
llogiq/flame#33 (comment)

I've added them to the gist above.

@dlaehnemann
Copy link
Author

You can now create flamegraphs with a new cargo subcommand:

https://github.com/flamegraph-rs/flamegraph

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment