Yesterday we talked about perf
. Let's start using perf
!
I learned how to make flame graphs with perf
today and it is THE BEST. I found this because Graydon Hoare pointed me to Brendan Gregg's excellent page on how to use perf.
Wait up! What's perf
? I've talked about strace
a lot before (in Debug your programs like they're closed source). strace
lets you see which system calls a program is calling. But what if you wanted to know
- how many CPU instructions it ran?
- How many L1 cache misses there were?
- profiling information for each assembly instruction?
strace
only does system calls, and none of those things are system calls. So it can't tell you any of those things!
perf
is a Linux tool that can tell you all of these things, and more! Let's run a quick example on the bytesum program from yesterday.
bork@kiwi ~/w/howcomputer> perf stat ./bytesum_mmap *.mp4 Performance counter stats for './bytesum_mmap The Newsroom S01E04.mp4': 158.141639 task-clock # 0.994 CPUs utilized 22 context-switches # 0.139 K/sec 9 CPU-migrations # 0.057 K/sec 133 page-faults # 0.841 K/sec 438,662,273 cycles # 2.774 GHz [82.43%] 269,916,782 stalled-cycles-frontend # 61.53% frontend cycles idle [82.38%] 131,557,379 stalled-cycles-backend # 29.99% backend cycles idle [66.66%] 681,518,403 instructions # 1.55 insns per cycle # 0.40 stalled cycles per insn [84.88%] 130,568,804 branches # 825.645 M/sec [84.85%] 20,756 branch-misses # 0.02% of all branches [83.68%] 0.159154389 seconds time elapsed
This is super neat information. But we can do even more fun things!
So let's say I wanted to profile my program!
sudo perf record -g ./bytesum_mmap *.mp4 sudo perf script | stackcollapse-perf.pl | flamegraph.pl > flamegraph.svg
Then I get a SVG! Here it is:
{%img /images/flamegraph.svg %}
This is AMAZING. But what does it mean? Basically perf
periodically interrupts the program and finds out where in the stack it is. The width of each part of this graph
We can see that there are 3 big parts -- there's the mmap
call (on the left), the main program execution (in the middle), and the