Skip to content

Instantly share code, notes, and snippets.

@adityaiitb
Last active October 27, 2020 16:48
Show Gist options
  • Save adityaiitb/e57926baf25980cb6051514feaf80891 to your computer and use it in GitHub Desktop.
Save adityaiitb/e57926baf25980cb6051514feaf80891 to your computer and use it in GitHub Desktop.
NVprof commands

Basically there are two modes.

  • Summary mode: All invocations of a kernel are bundled into one line.
  • Trace mode: Each invocation of a kernel is on a seperate line.

For GPU trace mode

Provides timeline of GPU activities in chronological order. For each kernel/memory copy, detailed information such as kernel parameters, shared memory usage and memory transfer throughput are shown. Good to check how much DRAM traffic was generated (?)

nvprof --print-gpu-trace

For collecting events

nvprof --events

The output is redirected to stderr by default. To save to a file.

nvprof --log-file foo

To get a summary

nvprof <application>

To output data for later visualization in visual profiler or nvprof

nvprof --output-profile

To capture all GPU metrics that the Visual Profiler needs for its guided analysis

nvprof --analysis-metrics -o foo.log

To capture a particular metric

nvprof --metrics ipc

To turn off API trace

nvprof --profile-api-trace none

API-trace mode shows the timeline of all CUDA runtime and driver API in chronological order

nvprof --print-api-trace

To see list of available events

An event is a countable activity, action, or occurrence on a device.

nvprof --query-events

To get full list of metrics

A metric is a characteristic of an application that is calculated from one or more event values.

nvprof --query-metrics

For example,

nvprof --events warps_launched,local_load --metrics ipc <application>

To collect all events on the device

nvprof --events all

To collect all metrics on the device

nvprof --metrics all

To profile a part of the code

In the code use cuda_profiler_api.h and the APIs cudaProfilerStart() and cudaProfilerStop() While running nvprof, disable profiling from the start of the application

nvprof --profile-from-start off

You can rename CPU and GPU resources. This helps in reading the log files.

To generate the SQL database (to be used with nvvp)

nvprof --export-profile foo.prof <app> <app args>

To compile a code using NVTX

nvcc foo.cu -lnvToolsExt

To profile

-f will force overwrite the file a.db, --export-profile will dump a sql file, --profile-from-start off will start profiling only when it encounters cudaprofilerstart()

nvprof --profile-from-start off -f --export-profile a.db ./a.out

To prevent LD_PRELOAD from interfering with NVprof

LD_PRELOAD=foo.so nvprof --openmp-profiling off python a.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment