Skip to content

Instantly share code, notes, and snippets.

@sdhegde
Created March 20, 2019 07:17
Show Gist options
  • Save sdhegde/20918c4495cce26f342d113c10b970f9 to your computer and use it in GitHub Desktop.
Save sdhegde/20918c4495cce26f342d113c10b970f9 to your computer and use it in GitHub Desktop.
**************************
USING SCOREP & CUBE: (I was unable to get good result with forked application)
**************************
rm -rf CMakeCache.txt CMakeFiles/ cmake_install.cmake Makefile
export SCOREP_ENABLE_PROFILING=1
export SCOREP_EXPERIMENT_DIRECTORY=sdh
export PATH=$PATH:/root/local/bin/
CXX="scorep /usr/bin/g++" cmake3 .
./DisplayImage
scorep-score -r sdh/profile.cubex ... or cube sdh/profile.cubex
**************************
USING VALGRIND, CALLGRIND AND KCACHEGRIND: (Takes lot of time. Slows down the execution)
**************************
valgrind --tool=callgrind --dump-instr=yes -v ./a.out ...........for callgraph and cache information
kcachegrind callgrind.out.5*
g++ -g mem.c
valgrind --tool=memcheck --leak-check=yes ./a.out ..........for mem leaks
g++ -g mem.c
valgrind --tool=massif ./a.out ...........for mem leaks
ms_print massif.out.931 ..........for command line display
massif-visualizer massif.out.931 ..............for visual display
**************************
USING GPROF: (easy to use and interpret but results not clear... Only profiles in "user space". Cannot see bottlenecks that occur in external shared libraries (like libc) or the kernel)
**************************
g++ -g -pg fork.cpp
./a.out
gprof ./a.out gmon.out
**************************
USING PPROF: (easy to use but cannot profile in forked() applications. workaround is to use startprofile after fork())
**************************
while compiling link the executable with libprofiler.so or -lprofiler
CPUPROFILE=sdh.prof ./DisplayImage ........will generate sdh.prof in current directory
pprof --text ./DisplayImage sdh.prof .........for text output
pprof --gv ./DisplayImage sdh.prof .........for graphical output
pprof --callgrind ./DisplayImage sdh.prof > sdh.callgrind .........for kcachegrind graphical output
kcachegrind sdh.callgrind
**************************
USING OPERF: (can be used for forked as well as multithreaded programs)
**************************
operf ./DisplayImage
OR
operf -e CPU_CLK_UNHALTED:500000,l2_requests:80000 ./DisplayImage
opreport -l .... see manpage for more options
ocount -e l2_requests:0x41 ./DisplayImage ... will give total count of L2 cache miss. man ocount
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment