DDP async allreduce:
- https://github.com/pytorch/pytorch/blob/2540f866ff1eff10dbed7ca47ea9c432e8583da2/torch/csrc/distributed/c10d/reducer.cpp#L841-L862
- https://github.com/pytorch/pytorch/blob/2540f866ff1eff10dbed7ca47ea9c432e8583da2/torch/csrc/distributed/c10d/default_comm_hooks.cpp
ProcessGroupNCCl: