Skip to content

Instantly share code, notes, and snippets.

@ReidAtcheson
Created November 2, 2023 21:05
Show Gist options
  • Save ReidAtcheson/50b6bc392dc571695ff0b82501564072 to your computer and use it in GitHub Desktop.
Save ReidAtcheson/50b6bc392dc571695ff0b82501564072 to your computer and use it in GitHub Desktop.
gather two performance counters, split them into two stack files, and then combine into a new metric

sample as long as you want, ctrl-C when you've had enough can also --call-graph dwarf for more accurate stack traces but will take a lot more memory & storage

sudo perf record -e cycles,instructions -c 1000000 -a --call-graph fp

get the stacks

sudo perf script > stacks.out

run the following awk script on stacks.out to split out the different hardware counter types

#!/usr/bin/awk -f    
    
# Match lines containing hardware counter type, e.g., "cycles:" or "instructions:"    
/[[:space:]]+[0-9]+[[:space:]]+[^:]+:/ {    
    # Extract the counter type, replace spaces with underscore, and use .txt as extension    
    counter_type = $NF    
    gsub(/:/, "", counter_type)    
    gsub(/[[:space:]]/, "_", counter_type)    
    output_file = counter_type ".txt"    
}    
    
{    
    # Print the current line to the respective output file    
    print $0 >> output_file    
}

Now calculate the folded stack frames. difffolded.pl usually is used for a differential flamegraph but here I'm just using it because it conveniently places both counters next to the stack trace in a way that can be easily processed by sort and awk

FlameGraph/stackcollapse-perf.pl cycles.txt > cycles.folded
FlameGraph/stackcollapse-perf.pl instructions.txt > instructions.folded
FlameGraph/difffolded.pl cycles.folded instructions.folded > diff.folded

Finally in this case you will want to play with diff.folded to get data you want. probably you will want a flamegraph from cycles.folded and then use that to guide you to a stack trace of interest. In my case I just wanted the stack traces associated with highest cputime, so I just sorted. Then to get "instructions per cycle" I divided one event type by the other (but this could be done with any events not just these e.g. cache miss/hit ratio)

sort -k 2 -r -n -t " " diff.folded > diff.sorted
awk '{if ($2 == 0) $4 = 0; else $4 = $3 / $2} 1' diff.sorted > diff.cpi

now you can get the IPC rate not just for the whole program but broken down into very fine-grained codepaths.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment