Summry of the following guide https://nvlabs.github.io/eccv2020-mixed-precision-tutorial/files/szymon_migacz-pytorch-performance-tuning-guide.pdf
- In the dataset class
pin_memory=True
- Enable for device specific CNN acceleration
torch.backends.cudnn.benchmark = True
-
Increase the batch size to max out GPU memory. SGD modification for large batch: LARS
-
Disable bias for convlutaion if followed firectly by batch norm to reduce paramters. Instead
model.zero_grad()
use
for param in model.parameters():
param.grad = None
- Add jit decorator to fuse cuda kernels
@torch.jit.script decorator to fuse cuda kernels