Usually, located at /usr/local/cuda/bin
$ nvprof python train_mnist.py
I prefer to use --print-gpu-trace.
$ nvprof --print-gpu-trace python train_mnist.py
On GPU machine, run
$ nvprof -o prof.nvvp python train_mnist.py
Copy prof.nvvp
into your local machine
$ scp your_gpu_machine:/path/to/prof.nvvp .
Then, run nvvp (nvidia visual profiler) on your local machine:
$ nvvp prof.nvvp
It works more comfortably than X11 forwarding or something.
My recommendation to anyone who wants to install nvvp on an up-to-date Mac (catalina) is not to even try.
Firstly you have to disable gatekeeper:
sudo spctl --master-disable
but then after you download nvvp it won't run because it wants an outdated java runtime (JRE) that's hard to get or install.
I still haven't figured it out.
Currently I'm trying to figure out how to do this, but it looks like it's very very painful.