anirudhacharya/Profiling Deep Neural Networks.md

## Profiling Deep Neural Networks.md

      
    Raw
  

              Profiling Deep Neural Networks.md
            
          
    Can we leverage the framework specific profilers, and maybe build a layer on top of it augment its features and functionalities.

This is what mxnet profiler dump looks like
Profile Statistics.
	Note that counter items are counter values and not time units.
Device Storage
=================
Name                          Total Count        Time (ms)    Min Time (ms)    Max Time (ms)    Avg Time (ms)
----                          -----------        ---------    -------------    -------------    -------------
Memory: gpu/0                         722     2642262.5000        8388.6084     2654455.2500     1323033.3750
Memory: cpu/0                         474           0.0000           0.0000       18984.9609        9492.4805

MXNET_C_API
=================
Name                          Total Count        Time (ms)    Min Time (ms)    Max Time (ms)    Avg Time (ms)
----                          -----------        ---------    -------------    -------------    -------------
MXAutogradMarkVariables               133           0.7330           0.0040           0.0080           0.0055
MXNDArrayFree                         411           0.8010           0.0000           0.1230           0.0019
...


operator
=================
Name                          Total Count        Time (ms)    Min Time (ms)    Max Time (ms)    Avg Time (ms)
----                          -----------        ---------    -------------    -------------    -------------
FullyConnected                          2           0.5520           0.2750           0.2770           0.2760
Flatten                                 2           0.3770           0.1880           0.1890           0.1885
...
The profiler dump needs some effort and familiarity to be readable and it gives a 0 ft view of the internals of the model. And profiling a model often involves using other third party tools and packages. This is where the Tornasole profiler can pitch in.
Generally though it won't give correct answers due to cuda sync issues, I believe.
However, I suspect if you use CUDA_LAUNCH_BLOCKING env var then it might work. That might be the trick to having any python profiler provide accurate info.
How will the profiler be used to optimize the model.

Live as the training is happening, monitor the metrics.

using tools like Nvidia-smi
<>