Skip to content

Instantly share code, notes, and snippets.

@jvns
Last active August 29, 2015 14:01
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jvns/30acc2338d13f9999d99 to your computer and use it in GitHub Desktop.
Save jvns/30acc2338d13f9999d99 to your computer and use it in GitHub Desktop.

perf!

Yesterday we talked about perf. Let's start using perf!

I learned how to make flame graphs with perf today and it is THE BEST. I found this because Graydon Hoare pointed me to Brendan Gregg's excellent page on how to use perf.

Wait up! What's perf? I've talked about strace a lot before (in Debug your programs like they're closed source). strace lets you see which system calls a program is calling. But what if you wanted to know

  • how many CPU instructions it ran?
  • How many L1 cache misses there were?
  • profiling information for each assembly instruction?

strace only does system calls, and none of those things are system calls. So it can't tell you any of those things!

perf is a Linux tool that can tell you all of these things, and more! Let's run a quick example on the bytesum program from yesterday.

bork@kiwi ~/w/howcomputer> perf stat ./bytesum_mmap *.mp4
 Performance counter stats for './bytesum_mmap The Newsroom S01E04.mp4':

        158.141639 task-clock                #    0.994 CPUs utilized          
                22 context-switches          #    0.139 K/sec                  
                 9 CPU-migrations            #    0.057 K/sec                  
               133 page-faults               #    0.841 K/sec                  
       438,662,273 cycles                    #    2.774 GHz                     [82.43%]
       269,916,782 stalled-cycles-frontend   #   61.53% frontend cycles idle    [82.38%]
       131,557,379 stalled-cycles-backend    #   29.99% backend  cycles idle    [66.66%]
       681,518,403 instructions              #    1.55  insns per cycle        
                                             #    0.40  stalled cycles per insn [84.88%]
       130,568,804 branches                  #  825.645 M/sec                   [84.85%]
            20,756 branch-misses             #    0.02% of all branches         [83.68%]

       0.159154389 seconds time elapsed

This is super neat information. But we can do even more fun things!

Flame graphs with perf

So let's say I wanted to profile my program!

sudo perf record -g ./bytesum_mmap *.mp4
sudo perf script | stackcollapse-perf.pl | flamegraph.pl > flamegraph.svg

Then I get a SVG! Here it is:

{%img /images/flamegraph.svg %}

This is AMAZING. But what does it mean? Basically perf periodically interrupts the program and finds out where in the stack it is. The width of each part of this graph

We can see that there are 3 big parts -- there's the mmap call (on the left), the main program execution (in the middle), and the

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment