adityaiitb/nvprof.md

## nvprof.md

      
    Raw
  

              nvprof.md
            
          
    Basically there are two modes.

Summary mode: All invocations of a kernel are bundled into one line.
Trace mode: Each invocation of a kernel is on a seperate line.

For GPU trace mode

Provides timeline of GPU activities in chronological order.
For each kernel/memory copy, detailed information such as kernel parameters, shared memory usage and
memory transfer throughput are shown. Good to check how much DRAM traffic was generated (?)
nvprof --print-gpu-trace

For collecting events

nvprof --events

The output is redirected to stderr by default. To save to a file.

nvprof --log-file foo

To get a summary

nvprof <application>

To output data for later visualization in visual profiler or nvprof

nvprof --output-profile

To capture all GPU metrics that the Visual Profiler needs for its guided analysis

nvprof --analysis-metrics -o foo.log

To capture a particular metric

nvprof --metrics ipc

To turn off API trace

nvprof --profile-api-trace none

API-trace mode shows the timeline of all CUDA runtime and driver API in chronological order

nvprof --print-api-trace

To see list of available events

An event is a countable activity, action, or occurrence on a device.
nvprof --query-events

To get full list of metrics

A metric is a characteristic of an application that is calculated from one or more event values.
nvprof --query-metrics

For example,
nvprof --events warps_launched,local_load --metrics ipc <application>

To collect all events on the device

nvprof --events all

To collect all metrics on the device

nvprof --metrics all

To profile a part of the code

In the code use cuda_profiler_api.h and the APIs cudaProfilerStart() and cudaProfilerStop()
While running nvprof, disable profiling from the start of the application
nvprof --profile-from-start off

You can rename CPU and GPU resources. This helps in reading the log files.
To generate the SQL database (to be used with nvvp)

nvprof --export-profile foo.prof <app> <app args>

To compile a code using NVTX

nvcc foo.cu -lnvToolsExt

To profile

-f will force overwrite the file a.db, --export-profile will dump a sql file, --profile-from-start off will start profiling only when it encounters cudaprofilerstart()
nvprof --profile-from-start off -f --export-profile a.db ./a.out

To prevent LD_PRELOAD from interfering with NVprof

LD_PRELOAD=foo.so nvprof --openmp-profiling off python a.py