Basically there are two modes.
- Summary mode: All invocations of a kernel are bundled into one line.
- Trace mode: Each invocation of a kernel is on a seperate line.
Provides timeline of GPU activities in chronological order. For each kernel/memory copy, detailed information such as kernel parameters, shared memory usage and memory transfer throughput are shown. Good to check how much DRAM traffic was generated (?)
nvprof --print-gpu-trace
nvprof --events
nvprof --log-file foo
nvprof <application>
nvprof --output-profile
nvprof --analysis-metrics -o foo.log
nvprof --metrics ipc
nvprof --profile-api-trace none
nvprof --print-api-trace
An event is a countable activity, action, or occurrence on a device.
nvprof --query-events
A metric is a characteristic of an application that is calculated from one or more event values.
nvprof --query-metrics
For example,
nvprof --events warps_launched,local_load --metrics ipc <application>
nvprof --events all
nvprof --metrics all
In the code use cuda_profiler_api.h
and the APIs cudaProfilerStart()
and cudaProfilerStop()
While running nvprof, disable profiling from the start of the application
nvprof --profile-from-start off
You can rename CPU and GPU resources. This helps in reading the log files.
nvprof --export-profile foo.prof <app> <app args>
nvcc foo.cu -lnvToolsExt
-f
will force overwrite the file a.db, --export-profile
will dump a sql file, --profile-from-start off
will start profiling only when it encounters cudaprofilerstart()
nvprof --profile-from-start off -f --export-profile a.db ./a.out
LD_PRELOAD=foo.so nvprof --openmp-profiling off python a.py