karimnaaji/valgrind.md

## valgrind.md

      
    Raw
  

              valgrind.md
            
          
http://valgrind.org/docs/manual/cg-manual.html

valgrind --tool=cachegrind ./a.out
cg_annotate cachegrind.out.* --auto=yes
Instruction misses


Ir: number of instructions executed
I1mr: I1 cache read misses
ILmr: LL (last level) cache instruction read misses

Cache misses


Dr: number of memory reads
D1mr: D1 cache read misses
DLmr: LL (last level) cache data read misses
Dw: number of memory writes
D1mw: D1 cache write misses
DLmw: LL (last level) cache data write misses


On a modern machine, an L1 miss will typically cost around 10 cycles, an LL miss can cost as much as 200 cycles, and a mispredicted branch costs in the region of 10 to 30 cycles. Detailed cache and branch profiling can be very useful for understanding how your program interacts with the machine and thus how to make it faster.

Visualizing data

brew install qcachegrind --with-graphviz

valgrind --tool=callgrind --dump-instr=yes --collect-jumps=yes --simulate-cache=yes ./a.out

Profile with --dump-instr=yes to have more infos.
https://baptiste-wicht.com/posts/2011/09/profile-c-application-with-callgrind-kcachegrind.html