Skip to content

Instantly share code, notes, and snippets.

@anirudhacharya
Last active February 19, 2020 22:18
Show Gist options
  • Save anirudhacharya/1d7289b09a8184a606f9ff545f35a58d to your computer and use it in GitHub Desktop.
Save anirudhacharya/1d7289b09a8184a606f9ff545f35a58d to your computer and use it in GitHub Desktop.

Can we leverage the framework specific profilers, and maybe build a layer on top of it augment its features and functionalities.

This is what mxnet profiler dump looks like

Profile Statistics.
	Note that counter items are counter values and not time units.
Device Storage
=================
Name                          Total Count        Time (ms)    Min Time (ms)    Max Time (ms)    Avg Time (ms)
----                          -----------        ---------    -------------    -------------    -------------
Memory: gpu/0                         722     2642262.5000        8388.6084     2654455.2500     1323033.3750
Memory: cpu/0                         474           0.0000           0.0000       18984.9609        9492.4805

MXNET_C_API
=================
Name                          Total Count        Time (ms)    Min Time (ms)    Max Time (ms)    Avg Time (ms)
----                          -----------        ---------    -------------    -------------    -------------
MXAutogradMarkVariables               133           0.7330           0.0040           0.0080           0.0055
MXNDArrayFree                         411           0.8010           0.0000           0.1230           0.0019
...


operator
=================
Name                          Total Count        Time (ms)    Min Time (ms)    Max Time (ms)    Avg Time (ms)
----                          -----------        ---------    -------------    -------------    -------------
FullyConnected                          2           0.5520           0.2750           0.2770           0.2760
Flatten                                 2           0.3770           0.1880           0.1890           0.1885
...

The profiler dump needs some effort and familiarity to be readable and it gives a 0 ft view of the internals of the model. And profiling a model often involves using other third party tools and packages. This is where the Tornasole profiler can pitch in.

Generally though it won't give correct answers due to cuda sync issues, I believe.

However, I suspect if you use CUDA_LAUNCH_BLOCKING env var then it might work. That might be the trick to having any python profiler provide accurate info.

How will the profiler be used to optimize the model.

Live as the training is happening, monitor the metrics.

using tools like Nvidia-smi <>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment