Last active
October 23, 2023 17:25
-
-
Save cuviper/700f10182e484790a4b8b84e3f00f586 to your computer and use it in GitHub Desktop.
BabelStream, OpenMP vs. Rayon
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ perf stat -d ./omp-stream | |
BabelStream | |
Version: 3.4 | |
Implementation: OpenMP | |
Running kernels 100 times | |
Precision: double | |
Array size: 268.4 MB (=0.3 GB) | |
Total size: 805.3 MB (=0.8 GB) | |
Function MBytes/sec Min (sec) Max Average | |
Copy 108865.244 0.00493 0.05374 0.00834 | |
Mul 97387.786 0.00551 0.03843 0.00959 | |
Add 100390.735 0.00802 0.03490 0.01230 | |
Triad 104748.201 0.00769 0.04499 0.01212 | |
Dot 116643.998 0.00460 0.03035 0.00887 | |
Performance counter stats for './omp-stream': | |
665,460.03 msec task-clock:u # 115.871 CPUs utilized | |
0 context-switches:u # 0.000 K/sec | |
0 cpu-migrations:u # 0.000 K/sec | |
60,325 page-faults:u # 0.091 K/sec | |
1,973,539,186,499 cycles:u # 2.966 GHz (49.83%) | |
20,578,592,028 stalled-cycles-frontend:u # 1.04% frontend cycles idle (49.75%) | |
1,892,921,586,383 stalled-cycles-backend:u # 95.92% backend cycles idle (10.05%) | |
412,763,127,962 instructions:u # 0.21 insn per cycle | |
# 4.59 stalled cycles per insn (20.14%) | |
78,155,286,693 branches:u # 117.446 M/sec (30.21%) | |
84,235,682 branch-misses:u # 0.11% of all branches (10.04%) | |
92,640,194,539 L1-dcache-loads:u # 139.212 M/sec (20.04%) | |
33,602,081,514 L1-dcache-load-misses:u # 36.27% of all L1-dcache hits (30.02%) | |
637,221,226 LLC-loads:u # 0.958 M/sec (29.95%) | |
134,179,050 LLC-load-misses:u # 21.06% of all LL-cache hits (39.89%) | |
5.743135905 seconds time elapsed | |
637.155108000 seconds user | |
28.291715000 seconds sys | |
$ perf stat -d ./target/release/babel_stream | |
BabelStream | |
Version: 0.5 | |
Implmentation: Rust | |
Running kernels 100 times | |
Precision: double | |
Array size: 268.4 MB (=0.3 GB) | |
Total size: 805.3 MB (=0.8 GB) | |
Function Mbytes/sec Min (sec) Max Average | |
Copy 54328.164 0.00988 0.02706 0.01375 | |
Mul 54421.785 0.00987 0.02571 0.01333 | |
Add 51490.177 0.01564 0.03606 0.02052 | |
Triad 52049.274 0.01547 0.03241 0.02031 | |
Dot 50989.734 0.01053 0.02623 0.01478 | |
Performance counter stats for './target/release/babel_stream': | |
1,062,303.36 msec task-clock:u # 122.712 CPUs utilized | |
0 context-switches:u # 0.000 K/sec | |
0 cpu-migrations:u # 0.000 K/sec | |
42,124 page-faults:u # 0.040 K/sec | |
3,260,121,432,385 cycles:u # 3.069 GHz (49.86%) | |
541,201,721,259 stalled-cycles-frontend:u # 16.60% frontend cycles idle (49.90%) | |
2,227,238,034,453 stalled-cycles-backend:u # 68.32% backend cycles idle (10.04%) | |
331,330,149,597 instructions:u # 0.10 insn per cycle | |
# 6.72 stalled cycles per insn (20.07%) | |
50,746,277,795 branches:u # 47.770 M/sec (30.08%) | |
3,184,008,620 branch-misses:u # 6.27% of all branches (10.01%) | |
89,919,348,934 L1-dcache-loads:u # 84.646 M/sec (20.01%) | |
9,660,898,319 L1-dcache-load-misses:u # 10.74% of all L1-dcache hits (29.99%) | |
878,764,317 LLC-loads:u # 0.827 M/sec (29.95%) | |
849,262,989 LLC-load-misses:u # 96.64% of all LL-cache hits (39.89%) | |
8.656898836 seconds time elapsed | |
1038.920388000 seconds user | |
23.939748000 seconds sys |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ perf stat -d taskset -c 0-63 ./omp-stream | |
BabelStream | |
Version: 3.4 | |
Implementation: OpenMP | |
Running kernels 100 times | |
Precision: double | |
Array size: 268.4 MB (=0.3 GB) | |
Total size: 805.3 MB (=0.8 GB) | |
Function MBytes/sec Min (sec) Max Average | |
Copy 52348.508 0.01026 0.01185 0.01043 | |
Mul 49850.428 0.01077 0.01266 0.01098 | |
Add 52537.521 0.01533 0.01801 0.01565 | |
Triad 54129.328 0.01488 0.01763 0.01513 | |
Dot 65692.195 0.00817 0.00940 0.00843 | |
Performance counter stats for 'taskset -c 0-63 ./omp-stream': | |
407,153.55 msec task-clock:u # 61.544 CPUs utilized | |
0 context-switches:u # 0.000 K/sec | |
0 cpu-migrations:u # 0.000 K/sec | |
48,376 page-faults:u # 0.119 K/sec | |
1,258,117,724,107 cycles:u # 3.090 GHz (50.07%) | |
10,514,200,105 stalled-cycles-frontend:u # 0.84% frontend cycles idle (49.94%) | |
1,201,580,975,236 stalled-cycles-backend:u # 95.51% backend cycles idle (9.95%) | |
179,689,025,851 instructions:u # 0.14 insn per cycle | |
# 6.69 stalled cycles per insn (19.96%) | |
24,049,840,548 branches:u # 59.068 M/sec (29.99%) | |
11,896,941 branch-misses:u # 0.05% of all branches (10.03%) | |
35,726,797,726 L1-dcache-loads:u # 87.748 M/sec (20.07%) | |
6,936,045,729 L1-dcache-load-misses:u # 19.41% of all L1-dcache hits (30.10%) | |
657,412,066 LLC-loads:u # 1.615 M/sec (30.09%) | |
103,466,460 LLC-load-misses:u # 15.74% of all LL-cache hits (40.10%) | |
6.615633299 seconds time elapsed | |
397.221694000 seconds user | |
10.255615000 seconds sys | |
$ perf stat -d taskset -c 0-63 ./target/release/babel_stream | |
BabelStream | |
Version: 0.5 | |
Implmentation: Rust | |
Running kernels 100 times | |
Precision: double | |
Array size: 268.4 MB (=0.3 GB) | |
Total size: 805.3 MB (=0.8 GB) | |
Function Mbytes/sec Min (sec) Max Average | |
Copy 58058.929 0.00925 0.01154 0.00974 | |
Mul 51776.537 0.01037 0.01149 0.01057 | |
Add 53694.250 0.01500 0.01545 0.01518 | |
Triad 54317.170 0.01483 0.01600 0.01503 | |
Dot 61588.954 0.00872 0.00974 0.00894 | |
Performance counter stats for 'taskset -c 0-63 ./target/release/babel_stream': | |
383,067.96 msec task-clock:u # 62.107 CPUs utilized | |
0 context-switches:u # 0.000 K/sec | |
0 cpu-migrations:u # 0.000 K/sec | |
29,813 page-faults:u # 0.078 K/sec | |
1,211,090,824,687 cycles:u # 3.162 GHz (49.85%) | |
145,032,175,754 stalled-cycles-frontend:u # 11.98% frontend cycles idle (49.92%) | |
928,771,095,319 stalled-cycles-backend:u # 76.69% backend cycles idle (10.06%) | |
95,197,325,012 instructions:u # 0.08 insn per cycle | |
# 9.76 stalled cycles per insn (20.10%) | |
9,535,026,301 branches:u # 24.891 M/sec (30.11%) | |
543,578,990 branch-misses:u # 5.70% of all branches (10.00%) | |
30,401,146,966 L1-dcache-loads:u # 79.362 M/sec (19.98%) | |
3,961,642,174 L1-dcache-load-misses:u # 13.03% of all L1-dcache hits (29.95%) | |
786,657,019 LLC-loads:u # 2.054 M/sec (29.90%) | |
272,522,007 LLC-load-misses:u # 34.64% of all LL-cache hits (39.86%) | |
6.167891515 seconds time elapsed | |
375.015537000 seconds user | |
8.316741000 seconds sys |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Operating System: Fedora 29 (Twenty Nine) | |
Kernel: Linux 5.0.5-200.fc29.ppc64le | |
$ lscpu | |
Architecture: ppc64le | |
Byte Order: Little Endian | |
CPU(s): 128 | |
On-line CPU(s) list: 0-127 | |
Thread(s) per core: 4 | |
Core(s) per socket: 16 | |
Socket(s): 2 | |
NUMA node(s): 2 | |
Model: 2.2 (pvr 004e 1202) | |
Model name: POWER9, altivec supported | |
CPU max MHz: 3800.0000 | |
CPU min MHz: 2166.0000 | |
L1d cache: 32K | |
L1i cache: 32K | |
L2 cache: 512K | |
L3 cache: 10240K | |
NUMA node0 CPU(s): 0-63 | |
NUMA node8 CPU(s): 64-127 | |
$ rpm -q rust clang llvm-libs | |
rust-1.35.0-1.fc29.ppc64le | |
clang-7.0.1-6.fc29.ppc64le | |
llvm-libs-7.0.1-4.fc29.ppc64le |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment