It wasn't obvious on PyTorch's documentation of how to use PyTorch Profiler (as of today, 8/12/2021), so I have spent some time to understand how to use it and this gist contains a simple example to use.
- Install the required packages:
python>=1.9.0
torchvision>=0.10.0
numpy
matplotlib
tensorboard
- Start tensorboard server
tensorboard --logdir=./logs
- Run
profiler.py
python profiler.py
- After the program stops, open tensorboard on the displayed port (usually,
http://localhost:6006
) on your browser. (it might take some time to load the data.
Contains the actual source code for when to train the model
Contains part of the original code that would be profiled (training stage).
- It seems that the profiler adds additional delay (like 10-20%) to the current pipeline and it increases as you increase the number of active iterations to be profiled.
- Profiler suggests some useful changes that impacts the performance (like increasing the batch size when GPU load is low or increase the number of workers when the Data Loader takes a lot of time to be preocessed)
- Although PyTorch Profiler gave more insights and suggestion to understand the general usage of resources based on my model and train structure, it isn't obvious how I can use PyTorch Profiler even further to apply more optimizations. I wish there was a more direct mapping between the nn.Modules/Components to what is being displayed. In addition, it would be cool if there is a graph showing the data flow and model structure with the amount of data transfer and time took.
- Example used from: https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html
- To understand how to use PyTorch Profiler: https://pytorch.org/tutorials/intermediate/tensorboard_profiler_tutorial.html
- PyTorch Profiler's docs: https://pytorch.org/docs/stable/profiler.html