-
-
Save kwen2501/89c28b6d12045b45ccf7e33816af2713 to your computer and use it in GitHub Desktop.
Comparison between NCCL 2.18.5 and 2.20.5 for MB-level all-reduce
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#### All-reduce of 1054725 floats | |
# NCCL 2.18.5 | |
[kw2501@devgpu009.cln1 ~/local/nccl-tests (master)]$ LD_LIBRARY_PATH=~/local/nccl/build-2.18.5/lib:$LD_LIBRARY_PATH ./build/all_reduce_perf -t 4 -b 4218900 -e 4218900 | |
# nThread 4 nGpus 1 minBytes 4218900 maxBytes 4218900 step: 1048576(bytes) warmup iters: 5 iters: 20 agg iters: 1 validation: 1 graph: 0 | |
# | |
# Using devices | |
# Rank 0 Group 0 Pid 348055 on devgpu009 device 0 [0x11] NVIDIA PG509-210 | |
# Rank 1 Group 0 Pid 348055 on devgpu009 device 1 [0x12] NVIDIA PG509-210 | |
# Rank 2 Group 0 Pid 348055 on devgpu009 device 2 [0x48] NVIDIA PG509-210 | |
# Rank 3 Group 0 Pid 348055 on devgpu009 device 3 [0x49] NVIDIA PG509-210 | |
NCCL version 2.18.5+cuda12.0 | |
# | |
# out-of-place in-place | |
# size count type redop root time algbw busbw #wrong time algbw busbw #wrong | |
# (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s) | |
4218900 1054725 float sum -1 1011.4 4.17 6.26 0 993.6 4.25 6.37 0 | |
# Out of bounds values : 0 OK | |
# Avg bus bandwidth : 6.31306 | |
# | |
# NCCL 2.20.5 | |
[kw2501@devgpu009.cln1 ~/local/nccl-tests (master)]$ LD_LIBRARY_PATH=~/local/nccl/build/lib:$LD_LIBRARY_PATH ./build/all_reduce_perf -t 4 -b 4218900 -e 4218900 | |
# nThread 4 nGpus 1 minBytes 4218900 maxBytes 4218900 step: 1048576(bytes) warmup iters: 5 iters: 20 agg iters: 1 validation: 1 graph: 0 | |
# | |
# Using devices | |
# Rank 0 Group 0 Pid 4134189 on devgpu009 device 0 [0x11] NVIDIA PG509-210 | |
# Rank 1 Group 0 Pid 4134189 on devgpu009 device 1 [0x12] NVIDIA PG509-210 | |
# Rank 2 Group 0 Pid 4134189 on devgpu009 device 2 [0x48] NVIDIA PG509-210 | |
# Rank 3 Group 0 Pid 4134189 on devgpu009 device 3 [0x49] NVIDIA PG509-210 | |
NCCL version 2.20.5+cuda12.0 | |
# | |
# out-of-place in-place | |
# size count type redop root time algbw busbw #wrong time algbw busbw #wrong | |
# (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s) | |
4218900 1054725 float sum -1 916.0 4.61 6.91 0 907.3 4.65 6.97 0 | |
# Out of bounds values : 0 OK | |
# Avg bus bandwidth : 6.94162 | |
# | |
#### All-reduce of 7347200 floats | |
# NCCL 2.18.5 | |
[kw2501@devgpu009.cln1 ~/local/nccl-tests (master)]$ LD_LIBRARY_PATH=~/local/nccl/build-2.18.5/lib:$LD_LIBRARY_PATH ./build/all_reduce_perf -t 4 -b 29388800 -e 29388800 | |
# nThread 4 nGpus 1 minBytes 29388800 maxBytes 29388800 step: 1048576(bytes) warmup iters: 5 iters: 20 agg iters: 1 validation: 1 graph: 0 | |
# | |
# Using devices | |
# Rank 0 Group 0 Pid 329030 on devgpu009 device 0 [0x11] NVIDIA PG509-210 | |
# Rank 1 Group 0 Pid 329030 on devgpu009 device 1 [0x12] NVIDIA PG509-210 | |
# Rank 2 Group 0 Pid 329030 on devgpu009 device 2 [0x48] NVIDIA PG509-210 | |
# Rank 3 Group 0 Pid 329030 on devgpu009 device 3 [0x49] NVIDIA PG509-210 | |
NCCL version 2.18.5+cuda12.0 | |
# | |
# out-of-place in-place | |
# size count type redop root time algbw busbw #wrong time algbw busbw #wrong | |
# (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s) | |
29388800 7347200 float sum -1 5014.4 5.86 8.79 0 5031.6 5.84 8.76 0 | |
# Out of bounds values : 0 OK | |
# Avg bus bandwidth : 8.77626 | |
# | |
# NCCL 2.20.5 | |
[kw2501@devgpu009.cln1 ~/local/nccl-tests (master)]$ LD_LIBRARY_PATH=~/local/nccl/build/lib:$LD_LIBRARY_PATH ./build/all_reduce_perf -t 4 -b 29388800 -e 29388800 | |
# nThread 4 nGpus 1 minBytes 29388800 maxBytes 29388800 step: 1048576(bytes) warmup iters: 5 iters: 20 agg iters: 1 validation: 1 graph: 0 | |
# | |
# Using devices | |
# Rank 0 Group 0 Pid 4145002 on devgpu009 device 0 [0x11] NVIDIA PG509-210 | |
# Rank 1 Group 0 Pid 4145002 on devgpu009 device 1 [0x12] NVIDIA PG509-210 | |
# Rank 2 Group 0 Pid 4145002 on devgpu009 device 2 [0x48] NVIDIA PG509-210 | |
# Rank 3 Group 0 Pid 4145002 on devgpu009 device 3 [0x49] NVIDIA PG509-210 | |
NCCL version 2.20.5+cuda12.0 | |
# | |
# out-of-place in-place | |
# size count type redop root time algbw busbw #wrong time algbw busbw #wrong | |
# (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s) | |
29388800 7347200 float sum -1 5114.4 5.75 8.62 0 5181.9 5.67 8.51 0 | |
# Out of bounds values : 0 OK | |
# Avg bus bandwidth : 8.56333 | |
# | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment