Skip to content

Instantly share code, notes, and snippets.

@yueyericardo
Last active March 6, 2023 15:48
Show Gist options
  • Save yueyericardo/c763b1a7d7b2be418aefeea015d3b582 to your computer and use it in GitHub Desktop.
Save yueyericardo/c763b1a7d7b2be418aefeea015d3b582 to your computer and use it in GitHub Desktop.

300k atoms 8 GPUs

gpu/ware off

Loop time of 30.3401 on 8 procs for 2000 steps with 300003 atoms

Performance: 2.848 ns/day, 8.428 hours/ns, 65.919 timesteps/s
77.5% CPU use with 8 MPI tasks x 1 OpenMP threads

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 22.527     | 22.892     | 23.286     |   4.2 | 75.45
Neigh   | 0.039957   | 0.046826   | 0.053138   |   1.7 |  0.15
Comm    | 6.1284     | 6.5441     | 6.9268     |   8.1 | 21.57
Output  | 0.0070663  | 0.0087     | 0.010944   |   1.4 |  0.03
Modify  | 0.66162    | 0.70591    | 0.7618     |   3.9 |  2.33
Other   |            | 0.1425     |            |       |  0.47

Nlocal:        37500.4 ave       37842 max       37237 min
Histogram: 1 0 2 1 1 0 2 0 0 1
Nghost:        24164.5 ave       24721 max       23981 min
Histogram: 3 3 0 0 1 0 0 0 0 1
Neighs:              0 ave           0 max           0 min
Histogram: 8 0 0 0 0 0 0 0 0 0
FullNghs:  5.61442e+06 ave  5.7134e+06 max 5.53929e+06 min
Histogram: 1 0 2 1 1 1 1 0 0 1

Total # of neighbors = 44915320
Ave neighs/atom = 149.71624
Neighbor list builds = 51
Dangerous builds = 0
Total wall time: 0:00:42

gpu/ware on

Loop time of 24.6107 on 8 procs for 2000 steps with 300003 atoms

Performance: 3.511 ns/day, 6.836 hours/ns, 81.266 timesteps/s
72.9% CPU use with 8 MPI tasks x 1 OpenMP threads

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 22.174     | 22.423     | 22.794     |   3.7 | 91.11
Neigh   | 0.037049   | 0.047096   | 0.05491    |   2.6 |  0.19
Comm    | 1.5475     | 1.9084     | 2.1571     |  12.3 |  7.75
Output  | 0.014146   | 0.018349   | 0.022585   |   1.9 |  0.07
Modify  | 0.14439    | 0.14937    | 0.15541    |   1.0 |  0.61
Other   |            | 0.06497    |            |       |  0.26

Nlocal:        37500.4 ave       37772 max       37214 min
Histogram: 1 0 0 3 0 0 1 2 0 1
Nghost:        24152.2 ave       24709 max       23955 min
Histogram: 4 1 1 0 0 1 0 0 0 1
Neighs:              0 ave           0 max           0 min
Histogram: 8 0 0 0 0 0 0 0 0 0
FullNghs:  5.61451e+06 ave 5.70442e+06 max  5.5309e+06 min
Histogram: 1 0 1 2 0 1 2 0 0 1

Total # of neighbors = 44916086
Ave neighs/atom = 149.71879
Neighbor list builds = 52
Dangerous builds = 0
Total wall time: 0:00:37

300k atoms 80 GPUs

gpu/ware off

Loop time of 9.1751 on 80 procs for 2000 steps with 300003 atoms

Performance: 9.417 ns/day, 2.549 hours/ns, 217.981 timesteps/s
94.1% CPU use with 80 MPI tasks x 1 OpenMP threads

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 5.6081     | 5.9197     | 6.2765     |   5.4 | 64.52
Neigh   | 0.025249   | 0.032095   | 0.042391   |   2.1 |  0.35
Comm    | 2.4157     | 2.7746     | 3.1194     |   8.3 | 30.24
Output  | 0.009548   | 0.017064   | 0.024032   |   2.7 |  0.19
Modify  | 0.26009    | 0.28328    | 0.32035    |   2.8 |  3.09
Other   |            | 0.1483     |            |       |  1.62

Nlocal:        3750.04 ave        4040 max        3326 min
Histogram: 2 1 2 9 13 13 14 11 8 7
Nghost:        6816.74 ave        7578 max        5787 min
Histogram: 4 4 2 9 12 6 18 12 3 10
Neighs:              0 ave           0 max           0 min
Histogram: 80 0 0 0 0 0 0 0 0 0
FullNghs:       561481 ave      641612 max      460156 min
Histogram: 2 2 7 6 16 10 14 10 9 4

Total # of neighbors = 44918464
Ave neighs/atom = 149.72672
Neighbor list builds = 51
Dangerous builds = 0
Total wall time: 0:00:18

gpu/ware on

Loop time of 10.8414 on 80 procs for 2000 steps with 300003 atoms

Performance: 7.969 ns/day, 3.011 hours/ns, 184.478 timesteps/s
93.2% CPU use with 80 MPI tasks x 1 OpenMP threads

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 5.5973     | 5.901      | 6.1283     |   4.5 | 54.43
Neigh   | 0.025175   | 0.031561   | 0.048121   |   2.7 |  0.29
Comm    | 4.3723     | 4.5986     | 4.9266     |   5.3 | 42.42
Output  | 0.029637   | 0.058832   | 0.094963   |   7.5 |  0.54
Modify  | 0.13985    | 0.15011    | 0.16165    |   1.4 |  1.38
Other   |            | 0.1013     |            |       |  0.93

Nlocal:        3750.04 ave        4052 max        3301 min
Histogram: 1 2 1 10 11 14 17 11 7 6
Nghost:        6816.74 ave        7586 max        5818 min
Histogram: 5 3 1 13 9 8 16 12 4 9
Neighs:              0 ave           0 max           0 min
Histogram: 80 0 0 0 0 0 0 0 0 0
FullNghs:       561460 ave      645634 max      455257 min
Histogram: 2 2 6 6 17 12 14 9 8 4

Total # of neighbors = 44916836
Ave neighs/atom = 149.72129
Neighbor list builds = 51
Dangerous builds = 0
Total wall time: 0:00:24

1M atoms, 80 GPUs

gpu/ware off

Loop time of 14.516 on 80 procs for 2000 steps with 1000002 atoms

Performance: 5.952 ns/day, 4.032 hours/ns, 137.779 timesteps/s
87.1% CPU use with 80 MPI tasks x 1 OpenMP threads

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 8.8705     | 9.4539     | 10.129     |   8.4 | 65.13
Neigh   | 0.030447   | 0.038145   | 0.055043   |   2.3 |  0.26
Comm    | 3.7486     | 4.4381     | 5.0218     |  12.4 | 30.57
Output  | 0.0082933  | 0.015203   | 0.022997   |   2.4 |  0.10
Modify  | 0.34225    | 0.40253    | 0.47904    |   5.4 |  2.77
Other   |            | 0.1681     |            |       |  1.16

Nlocal:          12500 ave       13235 max       11819 min
Histogram: 3 5 10 11 18 11 7 6 6 3
Nghost:        13365.2 ave       14832 max       12003 min
Histogram: 4 3 10 15 6 22 8 2 6 4
Neighs:              0 ave           0 max           0 min
Histogram: 80 0 0 0 0 0 0 0 0 0
FullNghs:  1.86392e+06 ave 2.06676e+06 max 1.68025e+06 min
Histogram: 2 7 8 12 19 11 9 5 4 3

Total # of neighbors = 1.491136e+08
Ave neighs/atom = 149.1133
Neighbor list builds = 55
Dangerous builds = 0
Total wall time: 0:00:33

gpu/ware on

Loop time of 15.6789 on 80 procs for 2000 steps with 1000002 atoms

Performance: 5.511 ns/day, 4.355 hours/ns, 127.560 timesteps/s
86.2% CPU use with 80 MPI tasks x 1 OpenMP threads

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 8.7195     | 9.2511     | 9.8959     |   8.0 | 59.00
Neigh   | 0.027662   | 0.034901   | 0.052389   |   2.7 |  0.22
Comm    | 5.4802     | 6.1307     | 6.6638     |   9.9 | 39.10
Output  | 0.021921   | 0.03406    | 0.043188   |   2.8 |  0.22
Modify  | 0.13924    | 0.14758    | 0.15666    |   1.2 |  0.94
Other   |            | 0.08047    |            |       |  0.51

Nlocal:          12500 ave       13234 max       11811 min
Histogram: 2 6 10 11 17 13 6 7 5 3
Nghost:          13363 ave       14771 max       12041 min
Histogram: 5 2 11 15 4 22 9 2 6 4
Neighs:              0 ave           0 max           0 min
Histogram: 80 0 0 0 0 0 0 0 0 0
FullNghs:  1.86386e+06 ave   2.065e+06 max 1.67809e+06 min
Histogram: 2 6 9 12 18 13 6 7 4 3

Total # of neighbors = 1.491085e+08
Ave neighs/atom = 149.1082
Neighbor list builds = 56
Dangerous builds = 0
Total wall time: 0:00:36
$ mpirun -np 8 ./build/all_reduce_perf -b 8 -e 1024M -f 2 -g 1
# nThread 1 nGpus 1 minBytes 8 maxBytes 1073741824 step: 2(factor) warmup iters: 5 iters: 20 agg iters: 1 validation: 1 graph: 0
#
# Using devices
#  Rank  0 Group  0 Pid  15358 on c1100a-s23 device  0 [0x07] NVIDIA A100-SXM4-80GB
#  Rank  1 Group  0 Pid  15359 on c1100a-s23 device  1 [0x0f] NVIDIA A100-SXM4-80GB
#  Rank  2 Group  0 Pid  15360 on c1100a-s23 device  2 [0x47] NVIDIA A100-SXM4-80GB
#  Rank  3 Group  0 Pid  15361 on c1100a-s23 device  3 [0x4e] NVIDIA A100-SXM4-80GB
#  Rank  4 Group  0 Pid  15362 on c1100a-s23 device  4 [0x87] NVIDIA A100-SXM4-80GB
#  Rank  5 Group  0 Pid  15363 on c1100a-s23 device  5 [0x90] NVIDIA A100-SXM4-80GB
#  Rank  6 Group  0 Pid  15364 on c1100a-s23 device  6 [0xb7] NVIDIA A100-SXM4-80GB
#  Rank  7 Group  0 Pid  15365 on c1100a-s23 device  7 [0xbd] NVIDIA A100-SXM4-80GB
#
#                                                              out-of-place                       in-place
#       size         count      type   redop    root     time   algbw   busbw #wrong     time   algbw   busbw #wrong
#        (B)    (elements)                               (us)  (GB/s)  (GB/s)            (us)  (GB/s)  (GB/s)
           8             2     float     sum      -1    19.81    0.00    0.00      0    19.78    0.00    0.00      0
          16             4     float     sum      -1    19.47    0.00    0.00      0    19.49    0.00    0.00      0
          32             8     float     sum      -1    19.74    0.00    0.00      0    19.54    0.00    0.00      0
          64            16     float     sum      -1    19.81    0.00    0.01      0    19.77    0.00    0.01      0
         128            32     float     sum      -1    20.21    0.01    0.01      0    19.83    0.01    0.01      0
         256            64     float     sum      -1    20.98    0.01    0.02      0    20.26    0.01    0.02      0
         512           128     float     sum      -1    20.73    0.02    0.04      0    20.46    0.03    0.04      0
        1024           256     float     sum      -1    22.89    0.04    0.08      0    22.07    0.05    0.08      0
        2048           512     float     sum      -1    23.28    0.09    0.15      0    23.10    0.09    0.16      0
        4096          1024     float     sum      -1    24.23    0.17    0.30      0    23.95    0.17    0.30      0
        8192          2048     float     sum      -1    26.68    0.31    0.54      0    25.83    0.32    0.56      0
       16384          4096     float     sum      -1    28.97    0.57    0.99      0    25.89    0.63    1.11      0
       32768          8192     float     sum      -1    31.75    1.03    1.81      0    31.03    1.06    1.85      0
       65536         16384     float     sum      -1    32.67    2.01    3.51      0    31.84    2.06    3.60      0
      131072         32768     float     sum      -1    32.04    4.09    7.16      0    31.47    4.17    7.29      0
      262144         65536     float     sum      -1    32.32    8.11   14.20      0    30.85    8.50   14.87      0
      524288        131072     float     sum      -1    33.01   15.88   27.80      0    30.97   16.93   29.63      0
     1048576        262144     float     sum      -1    42.10   24.91   43.59      0    39.57   26.50   46.37      0
     2097152        524288     float     sum      -1    68.48   30.62   53.59      0    58.15   36.07   63.12      0
     4194304       1048576     float     sum      -1    81.17   51.67   90.43      0    80.93   51.82   90.69      0
     8388608       2097152     float     sum      -1    126.2   66.45  116.29      0    122.1   68.70  120.22      0
    16777216       4194304     float     sum      -1    215.2   77.95  136.41      0    214.5   78.20  136.85      0
    33554432       8388608     float     sum      -1    341.2   98.34  172.09      0    341.1   98.38  172.16      0
    67108864      16777216     float     sum      -1    549.8  122.06  213.60      0    548.8  122.28  213.98      0
   134217728      33554432     float     sum      -1   1135.8  118.17  206.80      0   1132.0  118.56  207.49      0
   268435456      67108864     float     sum      -1   2093.5  128.22  224.39      0   2094.3  128.18  224.31      0
   536870912     134217728     float     sum      -1   4133.2  129.89  227.31      0   4134.5  129.85  227.24      0
  1073741824     268435456     float     sum      -1   8102.5  132.52  231.91      0   8103.5  132.50  231.88      0
# Out of bounds values : 0 OK
# Avg bus bandwidth    : 63.6936
#
$ mpirun -np 16 ./build/all_reduce_perf -b 8 -e 1024M -f 2 -g 1
# nThread 1 nGpus 1 minBytes 8 maxBytes 1073741824 step: 2(factor) warmup iters: 5 iters: 20 agg iters: 1 validation: 1 graph: 0
#
# Using devices
#  Rank  0 Group  0 Pid  15790 on c1100a-s23 device  0 [0x07] NVIDIA A100-SXM4-80GB
#  Rank  1 Group  0 Pid  15791 on c1100a-s23 device  1 [0x0f] NVIDIA A100-SXM4-80GB
#  Rank  2 Group  0 Pid  15792 on c1100a-s23 device  2 [0x47] NVIDIA A100-SXM4-80GB
#  Rank  3 Group  0 Pid  15793 on c1100a-s23 device  3 [0x4e] NVIDIA A100-SXM4-80GB
#  Rank  4 Group  0 Pid  15794 on c1100a-s23 device  4 [0x87] NVIDIA A100-SXM4-80GB
#  Rank  5 Group  0 Pid  15795 on c1100a-s23 device  5 [0x90] NVIDIA A100-SXM4-80GB
#  Rank  6 Group  0 Pid  15796 on c1100a-s23 device  6 [0xb7] NVIDIA A100-SXM4-80GB
#  Rank  7 Group  0 Pid  15797 on c1100a-s23 device  7 [0xbd] NVIDIA A100-SXM4-80GB
#  Rank  8 Group  0 Pid  96990 on c1100a-s29 device  0 [0x07] NVIDIA A100-SXM4-80GB
#  Rank  9 Group  0 Pid  96991 on c1100a-s29 device  1 [0x0f] NVIDIA A100-SXM4-80GB
#  Rank 10 Group  0 Pid  96992 on c1100a-s29 device  2 [0x47] NVIDIA A100-SXM4-80GB
#  Rank 11 Group  0 Pid  96993 on c1100a-s29 device  3 [0x4e] NVIDIA A100-SXM4-80GB
#  Rank 12 Group  0 Pid  96994 on c1100a-s29 device  4 [0x87] NVIDIA A100-SXM4-80GB
#  Rank 13 Group  0 Pid  96995 on c1100a-s29 device  5 [0x90] NVIDIA A100-SXM4-80GB
#  Rank 14 Group  0 Pid  96996 on c1100a-s29 device  6 [0xb7] NVIDIA A100-SXM4-80GB
#  Rank 15 Group  0 Pid  96997 on c1100a-s29 device  7 [0xbd] NVIDIA A100-SXM4-80GB
#
#                                                              out-of-place                       in-place
#       size         count      type   redop    root     time   algbw   busbw #wrong     time   algbw   busbw #wrong
#        (B)    (elements)                               (us)  (GB/s)  (GB/s)            (us)  (GB/s)  (GB/s)
           8             2     float     sum      -1    33.54    0.00    0.00      0    34.12    0.00    0.00      0
          16             4     float     sum      -1    33.70    0.00    0.00      0    34.33    0.00    0.00      0
          32             8     float     sum      -1    34.76    0.00    0.00      0    34.16    0.00    0.00      0
          64            16     float     sum      -1    34.05    0.00    0.00      0    34.74    0.00    0.00      0
         128            32     float     sum      -1    34.17    0.00    0.01      0    34.74    0.00    0.01      0
         256            64     float     sum      -1    36.71    0.01    0.01      0    36.54    0.01    0.01      0
         512           128     float     sum      -1    37.03    0.01    0.03      0    37.41    0.01    0.03      0
        1024           256     float     sum      -1    38.52    0.03    0.05      0    38.67    0.03    0.05      0
        2048           512     float     sum      -1    41.12    0.05    0.09      0    39.11    0.05    0.10      0
        4096          1024     float     sum      -1    40.94    0.10    0.19      0    40.84    0.10    0.19      0
        8192          2048     float     sum      -1    69.03    0.12    0.22      0    41.63    0.20    0.37      0
       16384          4096     float     sum      -1    43.43    0.38    0.71      0    42.17    0.39    0.73      0
       32768          8192     float     sum      -1    43.98    0.75    1.40      0    42.92    0.76    1.43      0
       65536         16384     float     sum      -1    57.76    1.13    2.13      0    43.18    1.52    2.85      0
      131072         32768     float     sum      -1    61.39    2.14    4.00      0    60.11    2.18    4.09      0
      262144         65536     float     sum      -1    74.12    3.54    6.63      0    63.82    4.11    7.70      0
      524288        131072     float     sum      -1    68.97    7.60   14.25      0    67.35    7.79   14.60      0
     1048576        262144     float     sum      -1    74.05   14.16   26.55      0    74.24   14.12   26.48      0
     2097152        524288     float     sum      -1    96.66   21.70   40.68      0    96.00   21.85   40.96      0
     4194304       1048576     float     sum      -1    117.4   35.74   67.01      0    116.3   36.06   67.62      0
     8388608       2097152     float     sum      -1    176.0   47.66   89.37      0    172.9   48.51   90.97      0
    16777216       4194304     float     sum      -1    288.2   58.21  109.15      0    286.0   58.65  109.97      0
    33554432       8388608     float     sum      -1    562.2   59.68  111.90      0    563.5   59.55  111.66      0
    67108864      16777216     float     sum      -1   1033.6   64.93  121.73      0   1032.9   64.97  121.82      0
   134217728      33554432     float     sum      -1   1924.3   69.75  130.78      0   1926.9   69.65  130.60      0
   268435456      67108864     float     sum      -1   3452.8   77.74  145.77      0   3452.5   77.75  145.78      0
   536870912     134217728     float     sum      -1   6937.1   77.39  145.11      0   6831.9   78.58  147.34      0
  1073741824     268435456     float     sum      -1    13648   78.67  147.51      0    13643   78.70  147.56      0
# Out of bounds values : 0 OK
# Avg bus bandwidth    : 41.7537
#
$ mpirun -np 32 ./build/all_reduce_perf -b 8 -e 1024M -f 2 -g 1
# nThread 1 nGpus 1 minBytes 8 maxBytes 1073741824 step: 2(factor) warmup iters: 5 iters: 20 agg iters: 1 validation: 1 graph: 0
#
# Using devices
#  Rank  0 Group  0 Pid  17400 on c1100a-s23 device  0 [0x07] NVIDIA A100-SXM4-80GB
#  Rank  1 Group  0 Pid  17401 on c1100a-s23 device  1 [0x0f] NVIDIA A100-SXM4-80GB
#  Rank  2 Group  0 Pid  17402 on c1100a-s23 device  2 [0x47] NVIDIA A100-SXM4-80GB
#  Rank  3 Group  0 Pid  17403 on c1100a-s23 device  3 [0x4e] NVIDIA A100-SXM4-80GB
#  Rank  4 Group  0 Pid  17404 on c1100a-s23 device  4 [0x87] NVIDIA A100-SXM4-80GB
#  Rank  5 Group  0 Pid  17405 on c1100a-s23 device  5 [0x90] NVIDIA A100-SXM4-80GB
#  Rank  6 Group  0 Pid  17406 on c1100a-s23 device  6 [0xb7] NVIDIA A100-SXM4-80GB
#  Rank  7 Group  0 Pid  17408 on c1100a-s23 device  7 [0xbd] NVIDIA A100-SXM4-80GB
#  Rank  8 Group  0 Pid  97310 on c1100a-s29 device  0 [0x07] NVIDIA A100-SXM4-80GB
#  Rank  9 Group  0 Pid  97311 on c1100a-s29 device  1 [0x0f] NVIDIA A100-SXM4-80GB
#  Rank 10 Group  0 Pid  97312 on c1100a-s29 device  2 [0x47] NVIDIA A100-SXM4-80GB
#  Rank 11 Group  0 Pid  97313 on c1100a-s29 device  3 [0x4e] NVIDIA A100-SXM4-80GB
#  Rank 12 Group  0 Pid  97314 on c1100a-s29 device  4 [0x87] NVIDIA A100-SXM4-80GB
#  Rank 13 Group  0 Pid  97315 on c1100a-s29 device  5 [0x90] NVIDIA A100-SXM4-80GB
#  Rank 14 Group  0 Pid  97316 on c1100a-s29 device  6 [0xb7] NVIDIA A100-SXM4-80GB
#  Rank 15 Group  0 Pid  97317 on c1100a-s29 device  7 [0xbd] NVIDIA A100-SXM4-80GB
#  Rank 16 Group  0 Pid  22049 on c1101a-s11 device  0 [0x07] NVIDIA A100-SXM4-80GB
#  Rank 17 Group  0 Pid  22050 on c1101a-s11 device  1 [0x0f] NVIDIA A100-SXM4-80GB
#  Rank 18 Group  0 Pid  22051 on c1101a-s11 device  2 [0x47] NVIDIA A100-SXM4-80GB
#  Rank 19 Group  0 Pid  22052 on c1101a-s11 device  3 [0x4e] NVIDIA A100-SXM4-80GB
#  Rank 20 Group  0 Pid  22053 on c1101a-s11 device  4 [0x87] NVIDIA A100-SXM4-80GB
#  Rank 21 Group  0 Pid  22054 on c1101a-s11 device  5 [0x90] NVIDIA A100-SXM4-80GB
#  Rank 22 Group  0 Pid  22055 on c1101a-s11 device  6 [0xb7] NVIDIA A100-SXM4-80GB
#  Rank 23 Group  0 Pid  22056 on c1101a-s11 device  7 [0xbd] NVIDIA A100-SXM4-80GB
#  Rank 24 Group  0 Pid  46136 on c1101a-s35 device  0 [0x07] NVIDIA A100-SXM4-80GB
#  Rank 25 Group  0 Pid  46137 on c1101a-s35 device  1 [0x0f] NVIDIA A100-SXM4-80GB
#  Rank 26 Group  0 Pid  46138 on c1101a-s35 device  2 [0x47] NVIDIA A100-SXM4-80GB
#  Rank 27 Group  0 Pid  46139 on c1101a-s35 device  3 [0x4e] NVIDIA A100-SXM4-80GB
#  Rank 28 Group  0 Pid  46140 on c1101a-s35 device  4 [0x87] NVIDIA A100-SXM4-80GB
#  Rank 29 Group  0 Pid  46141 on c1101a-s35 device  5 [0x90] NVIDIA A100-SXM4-80GB
#  Rank 30 Group  0 Pid  46142 on c1101a-s35 device  6 [0xb7] NVIDIA A100-SXM4-80GB
#  Rank 31 Group  0 Pid  46143 on c1101a-s35 device  7 [0xbd] NVIDIA A100-SXM4-80GB
#
#                                                              out-of-place                       in-place
#       size         count      type   redop    root     time   algbw   busbw #wrong     time   algbw   busbw #wrong
#        (B)    (elements)                               (us)  (GB/s)  (GB/s)            (us)  (GB/s)  (GB/s)
           8             2     float     sum      -1    47.49    0.00    0.00      0    46.90    0.00    0.00      0
          16             4     float     sum      -1    56.83    0.00    0.00      0    47.08    0.00    0.00      0
          32             8     float     sum      -1    47.12    0.00    0.00      0    46.59    0.00    0.00      0
          64            16     float     sum      -1    46.72    0.00    0.00      0    46.48    0.00    0.00      0
         128            32     float     sum      -1    47.20    0.00    0.01      0    47.34    0.00    0.01      0
         256            64     float     sum      -1    47.96    0.01    0.01      0    47.82    0.01    0.01      0
         512           128     float     sum      -1    49.00    0.01    0.02      0    50.35    0.01    0.02      0
        1024           256     float     sum      -1    52.82    0.02    0.04      0    53.23    0.02    0.04      0
        2048           512     float     sum      -1    57.54    0.04    0.07      0    57.42    0.04    0.07      0
        4096          1024     float     sum      -1    58.41    0.07    0.14      0    55.96    0.07    0.14      0
        8192          2048     float     sum      -1    59.80    0.14    0.27      0    58.41    0.14    0.27      0
       16384          4096     float     sum      -1    60.88    0.27    0.52      0    58.60    0.28    0.54      0
       32768          8192     float     sum      -1    60.71    0.54    1.05      0    60.05    0.55    1.06      0
       65536         16384     float     sum      -1    65.30    1.00    1.94      0    62.97    1.04    2.02      0
      131072         32768     float     sum      -1    82.36    1.59    3.08      0    80.40    1.63    3.16      0
      262144         65536     float     sum      -1    87.85    2.98    5.78      0    87.45    3.00    5.81      0
      524288        131072     float     sum      -1    94.68    5.54   10.73      0    89.00    5.89   11.41      0
     1048576        262144     float     sum      -1    98.76   10.62   20.57      0    101.2   10.36   20.07      0
     2097152        524288     float     sum      -1    123.9   16.93   32.81      0    123.7   16.96   32.85      0
     4194304       1048576     float     sum      -1    180.1   23.28   45.11      0    178.6   23.49   45.51      0
     8388608       2097152     float     sum      -1    222.5   37.70   73.04      0    222.5   37.70   73.05      0
    16777216       4194304     float     sum      -1    395.0   42.47   82.29      0    358.2   46.84   90.76      0
    33554432       8388608     float     sum      -1    636.2   52.75  102.19      0    635.7   52.78  102.27      0
    67108864      16777216     float     sum      -1   1144.4   58.64  113.62      0   1147.7   58.47  113.29      0
   134217728      33554432     float     sum      -1   2140.7   62.70  121.48      0   2138.4   62.77  121.61      0
   268435456      67108864     float     sum      -1   3922.7   68.43  132.59      0   4027.5   66.65  129.14      0
   536870912     134217728     float     sum      -1   7329.9   73.24  141.91      0   7076.4   75.87  146.99      0
  1073741824     268435456     float     sum      -1    14169   75.78  146.83      0    14174   75.75  146.77      0
# Out of bounds values : 0 OK
# Avg bus bandwidth    : 37.1955
#
$ mpirun -np 64 ./build/all_reduce_perf -b 8 -e 1024M -f 2 -g 1
# nThread 1 nGpus 1 minBytes 8 maxBytes 1073741824 step: 2(factor) warmup iters: 5 iters: 20 agg iters: 1 validation: 1 graph: 0
#
# Using devices
#  Rank  0 Group  0 Pid  17999 on c1100a-s23 device  0 [0x07] NVIDIA A100-SXM4-80GB
#  Rank  1 Group  0 Pid  18000 on c1100a-s23 device  1 [0x0f] NVIDIA A100-SXM4-80GB
#  Rank  2 Group  0 Pid  18001 on c1100a-s23 device  2 [0x47] NVIDIA A100-SXM4-80GB
#  Rank  3 Group  0 Pid  18002 on c1100a-s23 device  3 [0x4e] NVIDIA A100-SXM4-80GB
#  Rank  4 Group  0 Pid  18003 on c1100a-s23 device  4 [0x87] NVIDIA A100-SXM4-80GB
#  Rank  5 Group  0 Pid  18004 on c1100a-s23 device  5 [0x90] NVIDIA A100-SXM4-80GB
#  Rank  6 Group  0 Pid  18005 on c1100a-s23 device  6 [0xb7] NVIDIA A100-SXM4-80GB
#  Rank  7 Group  0 Pid  18006 on c1100a-s23 device  7 [0xbd] NVIDIA A100-SXM4-80GB
#  Rank  8 Group  0 Pid  98989 on c1100a-s29 device  0 [0x07] NVIDIA A100-SXM4-80GB
#  Rank  9 Group  0 Pid  98990 on c1100a-s29 device  1 [0x0f] NVIDIA A100-SXM4-80GB
#  Rank 10 Group  0 Pid  98991 on c1100a-s29 device  2 [0x47] NVIDIA A100-SXM4-80GB
#  Rank 11 Group  0 Pid  98993 on c1100a-s29 device  3 [0x4e] NVIDIA A100-SXM4-80GB
#  Rank 12 Group  0 Pid  98994 on c1100a-s29 device  4 [0x87] NVIDIA A100-SXM4-80GB
#  Rank 13 Group  0 Pid  98995 on c1100a-s29 device  5 [0x90] NVIDIA A100-SXM4-80GB
#  Rank 14 Group  0 Pid  98996 on c1100a-s29 device  6 [0xb7] NVIDIA A100-SXM4-80GB
#  Rank 15 Group  0 Pid  98997 on c1100a-s29 device  7 [0xbd] NVIDIA A100-SXM4-80GB
#  Rank 16 Group  0 Pid  22842 on c1101a-s11 device  0 [0x07] NVIDIA A100-SXM4-80GB
#  Rank 17 Group  0 Pid  22847 on c1101a-s11 device  1 [0x0f] NVIDIA A100-SXM4-80GB
#  Rank 18 Group  0 Pid  22853 on c1101a-s11 device  2 [0x47] NVIDIA A100-SXM4-80GB
#  Rank 19 Group  0 Pid  22857 on c1101a-s11 device  3 [0x4e] NVIDIA A100-SXM4-80GB
#  Rank 20 Group  0 Pid  22861 on c1101a-s11 device  4 [0x87] NVIDIA A100-SXM4-80GB
#  Rank 21 Group  0 Pid  22867 on c1101a-s11 device  5 [0x90] NVIDIA A100-SXM4-80GB
#  Rank 22 Group  0 Pid  22872 on c1101a-s11 device  6 [0xb7] NVIDIA A100-SXM4-80GB
#  Rank 23 Group  0 Pid  22877 on c1101a-s11 device  7 [0xbd] NVIDIA A100-SXM4-80GB
#  Rank 24 Group  0 Pid  46707 on c1101a-s35 device  0 [0x07] NVIDIA A100-SXM4-80GB
#  Rank 25 Group  0 Pid  46708 on c1101a-s35 device  1 [0x0f] NVIDIA A100-SXM4-80GB
#  Rank 26 Group  0 Pid  46709 on c1101a-s35 device  2 [0x47] NVIDIA A100-SXM4-80GB
#  Rank 27 Group  0 Pid  46710 on c1101a-s35 device  3 [0x4e] NVIDIA A100-SXM4-80GB
#  Rank 28 Group  0 Pid  46711 on c1101a-s35 device  4 [0x87] NVIDIA A100-SXM4-80GB
#  Rank 29 Group  0 Pid  46712 on c1101a-s35 device  5 [0x90] NVIDIA A100-SXM4-80GB
#  Rank 30 Group  0 Pid  46713 on c1101a-s35 device  6 [0xb7] NVIDIA A100-SXM4-80GB
#  Rank 31 Group  0 Pid  46714 on c1101a-s35 device  7 [0xbd] NVIDIA A100-SXM4-80GB
#  Rank 32 Group  0 Pid  23257 on c1103a-s11 device  0 [0x07] NVIDIA A100-SXM4-80GB
#  Rank 33 Group  0 Pid  23258 on c1103a-s11 device  1 [0x0f] NVIDIA A100-SXM4-80GB
#  Rank 34 Group  0 Pid  23259 on c1103a-s11 device  2 [0x47] NVIDIA A100-SXM4-80GB
#  Rank 35 Group  0 Pid  23260 on c1103a-s11 device  3 [0x4e] NVIDIA A100-SXM4-80GB
#  Rank 36 Group  0 Pid  23261 on c1103a-s11 device  4 [0x87] NVIDIA A100-SXM4-80GB
#  Rank 37 Group  0 Pid  23262 on c1103a-s11 device  5 [0x90] NVIDIA A100-SXM4-80GB
#  Rank 38 Group  0 Pid  23263 on c1103a-s11 device  6 [0xb7] NVIDIA A100-SXM4-80GB
#  Rank 39 Group  0 Pid  23264 on c1103a-s11 device  7 [0xbd] NVIDIA A100-SXM4-80GB
#  Rank 40 Group  0 Pid   5366 on c1103a-s17 device  0 [0x07] NVIDIA A100-SXM4-80GB
#  Rank 41 Group  0 Pid   5367 on c1103a-s17 device  1 [0x0f] NVIDIA A100-SXM4-80GB
#  Rank 42 Group  0 Pid   5368 on c1103a-s17 device  2 [0x47] NVIDIA A100-SXM4-80GB
#  Rank 43 Group  0 Pid   5369 on c1103a-s17 device  3 [0x4e] NVIDIA A100-SXM4-80GB
#  Rank 44 Group  0 Pid   5370 on c1103a-s17 device  4 [0x87] NVIDIA A100-SXM4-80GB
#  Rank 45 Group  0 Pid   5371 on c1103a-s17 device  5 [0x90] NVIDIA A100-SXM4-80GB
#  Rank 46 Group  0 Pid   5372 on c1103a-s17 device  6 [0xb7] NVIDIA A100-SXM4-80GB
#  Rank 47 Group  0 Pid   5373 on c1103a-s17 device  7 [0xbd] NVIDIA A100-SXM4-80GB
#  Rank 48 Group  0 Pid   6014 on c1104a-s17 device  0 [0x07] NVIDIA A100-SXM4-80GB
#  Rank 49 Group  0 Pid   6015 on c1104a-s17 device  1 [0x0f] NVIDIA A100-SXM4-80GB
#  Rank 50 Group  0 Pid   6016 on c1104a-s17 device  2 [0x47] NVIDIA A100-SXM4-80GB
#  Rank 51 Group  0 Pid   6017 on c1104a-s17 device  3 [0x4e] NVIDIA A100-SXM4-80GB
#  Rank 52 Group  0 Pid   6018 on c1104a-s17 device  4 [0x87] NVIDIA A100-SXM4-80GB
#  Rank 53 Group  0 Pid   6019 on c1104a-s17 device  5 [0x90] NVIDIA A100-SXM4-80GB
#  Rank 54 Group  0 Pid   6020 on c1104a-s17 device  6 [0xb7] NVIDIA A100-SXM4-80GB
#  Rank 55 Group  0 Pid   6021 on c1104a-s17 device  7 [0xbd] NVIDIA A100-SXM4-80GB
#  Rank 56 Group  0 Pid   2260 on c1104a-s23 device  0 [0x07] NVIDIA A100-SXM4-80GB
#  Rank 57 Group  0 Pid   2261 on c1104a-s23 device  1 [0x0f] NVIDIA A100-SXM4-80GB
#  Rank 58 Group  0 Pid   2262 on c1104a-s23 device  2 [0x47] NVIDIA A100-SXM4-80GB
#  Rank 59 Group  0 Pid   2263 on c1104a-s23 device  3 [0x4e] NVIDIA A100-SXM4-80GB
#  Rank 60 Group  0 Pid   2264 on c1104a-s23 device  4 [0x87] NVIDIA A100-SXM4-80GB
#  Rank 61 Group  0 Pid   2265 on c1104a-s23 device  5 [0x90] NVIDIA A100-SXM4-80GB
#  Rank 62 Group  0 Pid   2266 on c1104a-s23 device  6 [0xb7] NVIDIA A100-SXM4-80GB
#  Rank 63 Group  0 Pid   2267 on c1104a-s23 device  7 [0xbd] NVIDIA A100-SXM4-80GB
#
#                                                              out-of-place                       in-place
#       size         count      type   redop    root     time   algbw   busbw #wrong     time   algbw   busbw #wrong
#        (B)    (elements)                               (us)  (GB/s)  (GB/s)            (us)  (GB/s)  (GB/s)
           8             2     float     sum      -1    60.69    0.00    0.00      0    59.28    0.00    0.00      0
          16             4     float     sum      -1    60.66    0.00    0.00      0    60.33    0.00    0.00      0
          32             8     float     sum      -1    60.14    0.00    0.00      0    60.37    0.00    0.00      0
          64            16     float     sum      -1    61.07    0.00    0.00      0    61.16    0.00    0.00      0
         128            32     float     sum      -1    61.13    0.00    0.00      0    60.64    0.00    0.00      0
         256            64     float     sum      -1    61.63    0.00    0.01      0    61.75    0.00    0.01      0
         512           128     float     sum      -1    76.56    0.01    0.01      0    65.11    0.01    0.02      0
        1024           256     float     sum      -1    69.31    0.01    0.03      0    68.28    0.01    0.03      0
        2048           512     float     sum      -1    74.50    0.03    0.05      0    97.75    0.02    0.04      0
        4096          1024     float     sum      -1    75.29    0.05    0.11      0    73.19    0.06    0.11      0
        8192          2048     float     sum      -1    76.80    0.11    0.21      0    74.70    0.11    0.22      0
       16384          4096     float     sum      -1    87.92    0.19    0.37      0    76.12    0.22    0.42      0
       32768          8192     float     sum      -1    83.30    0.39    0.77      0    77.78    0.42    0.83      0
       65536         16384     float     sum      -1    82.02    0.80    1.57      0    79.98    0.82    1.61      0
      131072         32768     float     sum      -1    104.1    1.26    2.48      0    102.5    1.28    2.52      0
      262144         65536     float     sum      -1    114.4    2.29    4.51      0    106.2    2.47    4.86      0
      524288        131072     float     sum      -1    112.5    4.66    9.17      0    159.9    3.28    6.46      0
     1048576        262144     float     sum      -1    124.2    8.44   16.62      0    125.7    8.34   16.43      0
     2097152        524288     float     sum      -1    149.6   14.02   27.60      0    151.4   13.86   27.28      0
     4194304       1048576     float     sum      -1    204.9   20.47   40.30      0    201.8   20.78   40.91      0
     8388608       2097152     float     sum      -1    252.8   33.19   65.34      0    250.8   33.44   65.84      0
    16777216       4194304     float     sum      -1    390.0   43.02   84.69      0    389.5   43.08   84.81      0
    33554432       8388608     float     sum      -1    672.0   49.93   98.31      0    670.2   50.07   98.57      0
    67108864      16777216     float     sum      -1   1311.5   51.17  100.74      0   1315.8   51.00  100.41      0
   134217728      33554432     float     sum      -1   2487.7   53.95  106.22      0   2494.6   53.80  105.93      0
   268435456      67108864     float     sum      -1   4426.9   60.64  119.38      0   4417.1   60.77  119.65      0
   536870912     134217728     float     sum      -1   8267.6   64.94  127.84      0   8061.6   66.60  131.11      0
  1073741824     268435456     float     sum      -1    14472   74.20  146.07      0    14461   74.25  146.18      0
# Out of bounds values : 0 OK
# Avg bus bandwidth    : 34.0475
#
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment