Skip to content

Instantly share code, notes, and snippets.

@kwen2501
Created May 24, 2024 18:14
Show Gist options
  • Save kwen2501/89c28b6d12045b45ccf7e33816af2713 to your computer and use it in GitHub Desktop.
Save kwen2501/89c28b6d12045b45ccf7e33816af2713 to your computer and use it in GitHub Desktop.
Comparison between NCCL 2.18.5 and 2.20.5 for MB-level all-reduce
#### All-reduce of 1054725 floats
# NCCL 2.18.5
[kw2501@devgpu009.cln1 ~/local/nccl-tests (master)]$ LD_LIBRARY_PATH=~/local/nccl/build-2.18.5/lib:$LD_LIBRARY_PATH ./build/all_reduce_perf -t 4 -b 4218900 -e 4218900
# nThread 4 nGpus 1 minBytes 4218900 maxBytes 4218900 step: 1048576(bytes) warmup iters: 5 iters: 20 agg iters: 1 validation: 1 graph: 0
#
# Using devices
# Rank 0 Group 0 Pid 348055 on devgpu009 device 0 [0x11] NVIDIA PG509-210
# Rank 1 Group 0 Pid 348055 on devgpu009 device 1 [0x12] NVIDIA PG509-210
# Rank 2 Group 0 Pid 348055 on devgpu009 device 2 [0x48] NVIDIA PG509-210
# Rank 3 Group 0 Pid 348055 on devgpu009 device 3 [0x49] NVIDIA PG509-210
NCCL version 2.18.5+cuda12.0
#
# out-of-place in-place
# size count type redop root time algbw busbw #wrong time algbw busbw #wrong
# (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s)
4218900 1054725 float sum -1 1011.4 4.17 6.26 0 993.6 4.25 6.37 0
# Out of bounds values : 0 OK
# Avg bus bandwidth : 6.31306
#
# NCCL 2.20.5
[kw2501@devgpu009.cln1 ~/local/nccl-tests (master)]$ LD_LIBRARY_PATH=~/local/nccl/build/lib:$LD_LIBRARY_PATH ./build/all_reduce_perf -t 4 -b 4218900 -e 4218900
# nThread 4 nGpus 1 minBytes 4218900 maxBytes 4218900 step: 1048576(bytes) warmup iters: 5 iters: 20 agg iters: 1 validation: 1 graph: 0
#
# Using devices
# Rank 0 Group 0 Pid 4134189 on devgpu009 device 0 [0x11] NVIDIA PG509-210
# Rank 1 Group 0 Pid 4134189 on devgpu009 device 1 [0x12] NVIDIA PG509-210
# Rank 2 Group 0 Pid 4134189 on devgpu009 device 2 [0x48] NVIDIA PG509-210
# Rank 3 Group 0 Pid 4134189 on devgpu009 device 3 [0x49] NVIDIA PG509-210
NCCL version 2.20.5+cuda12.0
#
# out-of-place in-place
# size count type redop root time algbw busbw #wrong time algbw busbw #wrong
# (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s)
4218900 1054725 float sum -1 916.0 4.61 6.91 0 907.3 4.65 6.97 0
# Out of bounds values : 0 OK
# Avg bus bandwidth : 6.94162
#
#### All-reduce of 7347200 floats
# NCCL 2.18.5
[kw2501@devgpu009.cln1 ~/local/nccl-tests (master)]$ LD_LIBRARY_PATH=~/local/nccl/build-2.18.5/lib:$LD_LIBRARY_PATH ./build/all_reduce_perf -t 4 -b 29388800 -e 29388800
# nThread 4 nGpus 1 minBytes 29388800 maxBytes 29388800 step: 1048576(bytes) warmup iters: 5 iters: 20 agg iters: 1 validation: 1 graph: 0
#
# Using devices
# Rank 0 Group 0 Pid 329030 on devgpu009 device 0 [0x11] NVIDIA PG509-210
# Rank 1 Group 0 Pid 329030 on devgpu009 device 1 [0x12] NVIDIA PG509-210
# Rank 2 Group 0 Pid 329030 on devgpu009 device 2 [0x48] NVIDIA PG509-210
# Rank 3 Group 0 Pid 329030 on devgpu009 device 3 [0x49] NVIDIA PG509-210
NCCL version 2.18.5+cuda12.0
#
# out-of-place in-place
# size count type redop root time algbw busbw #wrong time algbw busbw #wrong
# (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s)
29388800 7347200 float sum -1 5014.4 5.86 8.79 0 5031.6 5.84 8.76 0
# Out of bounds values : 0 OK
# Avg bus bandwidth : 8.77626
#
# NCCL 2.20.5
[kw2501@devgpu009.cln1 ~/local/nccl-tests (master)]$ LD_LIBRARY_PATH=~/local/nccl/build/lib:$LD_LIBRARY_PATH ./build/all_reduce_perf -t 4 -b 29388800 -e 29388800
# nThread 4 nGpus 1 minBytes 29388800 maxBytes 29388800 step: 1048576(bytes) warmup iters: 5 iters: 20 agg iters: 1 validation: 1 graph: 0
#
# Using devices
# Rank 0 Group 0 Pid 4145002 on devgpu009 device 0 [0x11] NVIDIA PG509-210
# Rank 1 Group 0 Pid 4145002 on devgpu009 device 1 [0x12] NVIDIA PG509-210
# Rank 2 Group 0 Pid 4145002 on devgpu009 device 2 [0x48] NVIDIA PG509-210
# Rank 3 Group 0 Pid 4145002 on devgpu009 device 3 [0x49] NVIDIA PG509-210
NCCL version 2.20.5+cuda12.0
#
# out-of-place in-place
# size count type redop root time algbw busbw #wrong time algbw busbw #wrong
# (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s)
29388800 7347200 float sum -1 5114.4 5.75 8.62 0 5181.9 5.67 8.51 0
# Out of bounds values : 0 OK
# Avg bus bandwidth : 8.56333
#
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment