Skip to content

Instantly share code, notes, and snippets.

@dgarvit
Last active February 7, 2020 01:28
Show Gist options
  • Save dgarvit/2bc62512f4d719fd3582d4a14a51ec81 to your computer and use it in GitHub Desktop.
Save dgarvit/2bc62512f4d719fd3582d4a14a51ec81 to your computer and use it in GitHub Desktop.
Benchmark results for atomic int and atomicobjects (strong scaling)
Raw pin/unpin only
network_atomics=none
16 M objects allocated per locale
numLocales, time (s)
1 1.5583
2 1.5739
4 1.57621
8 1.60641
16 1.61897
32 1.66524
64 1.60011
network_atomics=ugni
16 M objects allocated per locale
numLocales, time (s)
1 0.020405
2 0.02442
4 0.027286
8 0.031058
16 0.034931
32 0.037625
64 0.034682
Scatter objects, reclaim only at the end
16 M objects
network_atomics=none
remote objects = 0%
numLocales, time (s)
1 3.01622
2 1.73897
4 0.949635
8 0.494047
16 0.240596
32 0.118549
64 0.071424
remote objects = 10%
1 3.14115
2 1.90951
4 1.03355
8 0.530026
16 0.261948
32 0.15325
64 0.12308
remote objects = 20%
1 2.88815
2 2.26472
4 1.07301
8 0.504174
16 0.260668
32 0.155765
64 0.120818
remote objects = 30%
1 3.06094
2 1.90049
4 1.15767
8 0.516648
16 0.26089
32 0.170124
64 0.125183
remote objects = 40%
1 3.13103
2 2.92544
4 1.31451
8 0.539578
16 0.261921
32 0.169782
64 0.129289
remote objects = 50%
1 3.19511
2 2.35302
4 1.29514
8 0.592913
16 0.275421
32 0.163127
64 0.133604
remote objects = 60%
1 2.98317
2 2.66685
4 1.43263
8 0.603263
16 0.265338
32 0.162577
64 0.138754
remote objects = 70%
1 2.87271
2 2.59456
4 1.47964
8 0.703368
16 0.281222
32 0.168126
64 0.133291
remote objects = 80%
1 3.05279
2 2.89676
4 1.41663
8 0.626899
16 0.282097
32 0.198668
64 0.138998
remote objects = 90%
1 2.91319
2 3.38245
4 1.86299
8 0.634599
16 0.308711
32 0.17454
64 0.149333
remote objects = 100%
2 3.00645
4 1.314
8 0.73029
16 0.31973
32 0.172545
64 0.146659
network_atomics=ugni
remote objects = 0%
1 3.23368
2 1.79263
4 0.928954
8 0.504681
16 0.243599
32 0.113414
64 0.062002
remote objects = 10%
1 2.87885
2 1.86829
4 0.940694
8 0.493496
16 0.276033
32 0.153936
64 0.114444
remote objects = 20%
1 2.97238
2 2.1796
4 1.04667
8 0.506582
16 0.261973
32 0.156995
64 0.122928
remote objects = 30%
1 2.87696
2 2.40251
4 1.13783
8 0.557812
16 0.268959
32 0.161342
64 0.128638
remote objects = 40%
1 2.97277
2 2.18607
4 1.1993
8 0.531326
16 0.263974
32 0.157514
64 0.128906
remote objects = 50%
1 2.89224
2 3.14617
4 1.43341
8 0.574766
16 0.262863
32 0.165822
64 0.125364
remote objects = 60%
1 2.95948
2 3.30499
4 1.57277
8 0.611959
16 0.272373
32 0.162706
64 0.131681
remote objects = 70%
1 3.07588
2 2.54733
4 1.46197
8 0.631646
16 0.271251
32 0.166203
remote objects = 80%
1 2.87098
2 2.81716
4 1.60721
8 0.636061
16 0.290022
32 0.173686
64 0.13903
remote objects = 90%
1 2.92372
2 3.1766
4 1.81945
8 0.682791
16 0.304638
32 0.167349
64 0.134675
remote objects = 100%
2 2.83844
4 1.31252
8 0.715171
16 0.312898
32 0.16615
64 0.143915
# network_atomics=none
aprun -q -cc none -d24 -n1 -N1 -j0 ./distnone_real -nl 1 --INT --NABA --YABA
atomic int benchmarks:
25% read, 25% write, 25% cas, 25% exchange
44 tasks: 0.0185367s, 26.3305ns/op
AtomicObject benchmarks without ABA:
25% read, 25% write, 25% cas, 25% exchange
44 tasks: 0.018407s, 26.1463ns/op
AtomicObject benchmarks with ABA:
25% read, 25% write, 25% cas, 25% exchange
44 tasks: 0.092457s, 131.331ns/op
aprun -q -cc none -d24 -n2 -N1 -j0 ./distnone_real -nl 2 --INT --NABA --YABA
atomic int benchmarks:
25% read, 25% write, 25% cas, 25% exchange
44 tasks: 1.95719s, 1390.05ns/op
AtomicObject benchmarks without ABA:
25% read, 25% write, 25% cas, 25% exchange
44 tasks: 1.90417s, 1352.39ns/op
AtomicObject benchmarks with ABA:
25% read, 25% write, 25% cas, 25% exchange
44 tasks: 1.43005s, 1015.66ns/op
aprun -q -cc none -d24 -n4 -N1 -j0 ./distnone_real -nl 4 --INT --NABA --YABA
atomic int benchmarks:
25% read, 25% write, 25% cas, 25% exchange
44 tasks: 4.85293s, 1723.34ns/op
AtomicObject benchmarks without ABA:
25% read, 25% write, 25% cas, 25% exchange
44 tasks: 4.9263s, 1749.4ns/op
AtomicObject benchmarks with ABA:
25% read, 25% write, 25% cas, 25% exchange
44 tasks: 4.49971s, 1597.91ns/op
aprun -q -cc none -d24 -n8 -N1 -j0 ./distnone_real -nl 8 --INT --NABA --YABA
atomic int benchmarks:
25% read, 25% write, 25% cas, 25% exchange
44 tasks: 14.8705s, 2640.36ns/op
AtomicObject benchmarks without ABA:
25% read, 25% write, 25% cas, 25% exchange
44 tasks: 15.2733s, 2711.87ns/op
AtomicObject benchmarks with ABA:
25% read, 25% write, 25% cas, 25% exchange
44 tasks: 13.1667s, 2337.84ns/op
aprun -q -cc none -d24 -n16 -N1 -j0 ./distnone_real -nl 16 --INT
atomic int benchmarks:
25% read, 25% write, 25% cas, 25% exchange
44 tasks: 22.2947s, 1979.29ns/op
AtomicObject benchmarks without ABA:
25% read, 25% write, 25% cas, 25% exchange
44 tasks: 21.887s, 1943.09ns/op
AtomicObject benchmarks with ABA:
25% read, 25% write, 25% cas, 25% exchange
44 tasks: 28.15s, 2499.11ns/op
aprun -q -cc none -d24 -n32 -N1 -j0 ./distnone_real -nl 32
atomic int benchmarks:
25% read, 25% write, 25% cas, 25% exchange
44 tasks: 47.5409s, 2110.3ns/op
AtomicObject benchmarks without ABA:
25% read, 25% write, 25% cas, 25% exchange
44 tasks: 46.1025s, 2046.46ns/op
AtomicObject benchmarks with ABA:
25% read, 25% write, 25% cas, 25% exchange
44 tasks: 41.9057s, 1860.16ns/op
aprun -q -cc none -d24 -n64 -N1 -j0 ./distnone_real -nl 64
atomic int benchmarks:
25% read, 25% write, 25% cas, 25% exchange
44 tasks: 92.0024s, 2041.96ns/op
AtomicObject benchmarks without ABA:
25% read, 25% write, 25% cas, 25% exchange
44 tasks: 136.218s, 3023.29ns/op
AtomicObject benchmarks with ABA:
25% read, 25% write, 25% cas, 25% exchange
44 tasks: 86.7444s, 1925.26ns/op
# network_atomics=ugni
aprun -q -cc none -d24 -n1 -N1 -j0 ./dist-overhead_real -nl 1 --N=3 --OPS_PER_TASK=16000
atomic int benchmarks:
25% read, 25% write, 25% cas, 25% exchange
44 tasks: 0.018523s, 26.3111ns/op
AtomicObject benchmarks without ABA:
25% read, 25% write, 25% cas, 25% exchange
44 tasks: 0.0183873s, 26.1184ns/op
AtomicObject benchmarks with ABA:
25% read, 25% write, 25% cas, 25% exchange
44 tasks: 0.0870547s, 123.657ns/op
aprun -q -cc none -d24 -n2 -N1 -j0 ./dist-overhead_real -nl 2 --N=3 --OPS_PER_TASK=16000
atomic int benchmarks:
25% read, 25% write, 25% cas, 25% exchange
44 tasks: 0.192598s, 136.788ns/op
AtomicObject benchmarks without ABA:
25% read, 25% write, 25% cas, 25% exchange
44 tasks: 0.18238s, 129.531ns/op
AtomicObject benchmarks with ABA:
25% read, 25% write, 25% cas, 25% exchange
44 tasks: 1.50747s, 1070.64ns/op
aprun -q -cc none -d24 -n4 -N1 -j0 ./dist-overhead_real -nl 4 --N=3 --OPS_PER_TASK=16000
atomic int benchmarks:
25% read, 25% write, 25% cas, 25% exchange
44 tasks: 0.362432s, 128.705ns/op
AtomicObject benchmarks without ABA:
25% read, 25% write, 25% cas, 25% exchange
44 tasks: 0.35061s, 124.506ns/op
AtomicObject benchmarks with ABA:
25% read, 25% write, 25% cas, 25% exchange
44 tasks: 4.12241s, 1463.93ns/op
aprun -q -cc none -d24 -n8 -N1 -j0 ./dist-overhead_real -nl 8 --N=3 --OPS_PER_TASK=16000
atomic int benchmarks:
25% read, 25% write, 25% cas, 25% exchange
44 tasks: 0.670075s, 118.976ns/op
AtomicObject benchmarks without ABA:
25% read, 25% write, 25% cas, 25% exchange
44 tasks: 0.655051s, 116.309ns/op
AtomicObject benchmarks with ABA:
25% read, 25% write, 25% cas, 25% exchange
44 tasks: 10.189s, 1809.13ns/op
aprun -q -cc none -d24 -n16 -N1 -j0 ./dist-overhead_real -nl 16 --N=3 --OPS_PER_TASK=16000
atomic int benchmarks:
25% read, 25% write, 25% cas, 25% exchange
44 tasks: 1.35869s, 120.622ns/op
AtomicObject benchmarks without ABA:
25% read, 25% write, 25% cas, 25% exchange
44 tasks: 1.3413s, 119.078ns/op
AtomicObject benchmarks with ABA:
25% read, 25% write, 25% cas, 25% exchange
44 tasks: 20.7713s, 1844.04ns/op
aprun -q -cc none -d24 -n32 -N1 -j0 ./dist-overhead_real -nl 32 --N=3 --OPS_PER_TASK=16000
atomic int benchmarks:
25% read, 25% write, 25% cas, 25% exchange
44 tasks: 2.59045s, 114.988ns/op
AtomicObject benchmarks without ABA:
25% read, 25% write, 25% cas, 25% exchange
44 tasks: 2.5551s, 113.419ns/op
AtomicObject benchmarks with ABA:
25% read, 25% write, 25% cas, 25% exchange
44 tasks: 43.2426s, 1919.5ns/op
aprun -q -cc none -d24 -n64 -N1 -j0 ./dist-overhead_real -nl 64 --N=3 --OPS_PER_TASK=16000
atomic int benchmarks:
25% read, 25% write, 25% cas, 25% exchange
44 tasks: 5.36364s, 119.044ns/op
AtomicObject benchmarks without ABA:
25% read, 25% write, 25% cas, 25% exchange
44 tasks: 5.32405s, 118.165ns/op
AtomicObject benchmarks with ABA:
25% read, 25% write, 25% cas, 25% exchange
44 tasks: 88.583s, 1966.06ns/op
# shared memory
atomic int benchmarks:
25% read, 25% write, 25% cas, 25% exchange
1 tasks: 0.0330617s, 31.5301ns/op
2 tasks: 0.0861463s, 41.0778ns/op
4 tasks: 0.186779s, 44.5315ns/op
8 tasks: 0.355418s, 42.3691ns/op
16 tasks: 0.724626s, 43.1911ns/op
32 tasks: 1.44616s, 43.0988ns/op
44 tasks: 2.32923s, 50.4846ns/op
AtomicObject benchmarks without ABA:
25% read, 25% write, 25% cas, 25% exchange
1 tasks: 0.0472s, 45.0134ns/op
2 tasks: 0.088331s, 42.1195ns/op
4 tasks: 0.184917s, 44.0877ns/op
8 tasks: 0.388866s, 46.3564ns/op
16 tasks: 0.816834s, 48.6871ns/op
32 tasks: 1.69394s, 50.4834ns/op
44 tasks: 2.31256s, 50.1234ns/op
AtomicObject benchmarks without ABA with pointer compression:
25% read, 25% write, 25% cas, 25% exchange
1 tasks: 0.048418s, 46.175ns/op
2 tasks: 0.090405s, 43.1085ns/op
4 tasks: 0.185053s, 44.1202ns/op
8 tasks: 0.39092s, 46.6013ns/op
16 tasks: 0.818064s, 48.7604ns/op
32 tasks: 1.68991s, 50.3634ns/op
44 tasks: 2.31434s, 50.162ns/op
AtomicObject benchmarks with ABA:
25% read, 25% write, 25% cas, 25% exchange
1 tasks: 0.0494153s, 47.1261ns/op
2 tasks: 0.11394s, 54.3307ns/op
4 tasks: 0.256564s, 61.1696ns/op
8 tasks: 0.525748s, 62.674ns/op
16 tasks: 1.23689s, 73.7246ns/op
32 tasks: 2.55617s, 76.1798ns/op
44 tasks: 3.54687s, 76.8763ns/op
AtomicObject benchmarks with ABA and pointer compression:
25% read, 25% write, 25% cas, 25% exchange
1 tasks: 0.0454677s, 43.3613ns/op
2 tasks: 0.110333s, 52.611ns/op
4 tasks: 0.239202s, 57.0302ns/op
8 tasks: 0.534103s, 63.67ns/op
16 tasks: 1.25923s, 75.056ns/op
32 tasks: 2.62732s, 78.3003ns/op
44 tasks: 3.96569s, 85.954ns/op
tryReclaim called on each iteration. All objects reclaimed at the end
16 M objects
network_atomics=none
0%
1 7.31958
2 13.9954
4 3.7995
8 21.9844
16 1.02741
32 1.16023
64 2.70584
10%
1 6.21947
2 58.226
4 24.8165
8 6.12848
16 1.35128
32 0.916375
64 1.64975
20%
1 7.23076
2 19.5757
4 3.26806
8 2.63286
16 2.22976
32 1.0012
64 1.44047
30%
1 7.22922
2 66.9549
4 32.798
8 6.08776
16 3.06663
32 1.0688
64 1.10993
40%
1 5.43495
2 67.1935
4 13.0394
8 4.21039
16 2.69073
32 0.887078
64 1.18566
50%
1 9.58338
2 75.0014
4 8.53853
8 5.69507
16 1.19352
32 0.822647
64 2.09617
60%
1 6.39817
2 61.9263
4 11.4444
8 2.36178
16 1.71003
32 0.941425
64 3.1179
70%
1 7.36892
2 73.4677
4 6.42488
8 6.45945
16 4.78456
32 0.868265
64 2.7148
80%
1 7.80598
2 76.674
4 22.8306
8 6.92265
16 1.37267
32 1.52423
64 8.32087
90%
1 6.39533
2 80.6809
4 45.4652
8 27.9114
16 2.02261
32 0.582035
64 1.82864
100%
2 77.1783
4 23.0952
8 23.7952
16 4.72845
32 1.11778
64 1.72535
network_atomics=ugni
0%
1 5.12878
2 15.9308
4 34.4596
8 3.52972
16 1.12044
32 2.9881
64 1.34755
10%
1 8.60203
2 28.6251
4 23.9659
8 1.86122
16 2.59241
32 1.14219
64 1.16779
20%
1 5.2781
2 28.5213
4 8.80753
8 2.73159
16 1.80005
32 0.865161
64 0.684403
30%
1 5.90445
2 26.1451
4 8.87377
8 4.40997
16 1.03532
32 1.96075
64 0.84714
40%
1 6.76695
2 70.8779
4 26.9068
8 8.04014
16 2.64119
32 0.495541
64 0.613618
50%
1 13.7217
2 79.9154
4 8.29462
8 2.70545
16 0.605948
32 1.83576
64 1.43469
60%
1 5.17518
2 78.9599
4 10.8413
8 4.09237
16 5.74227
32 1.49777
64 1.39989
70%
1 6.28498
2 78.8084
4 8.05115
8 11.1166
16 1.96246
32 3.18655
64 0.526966
80%
1 9.46197
2 80.6102
4 36.9373
8 1.96044
16 2.28738
32 2.20695
64 4.6088
90%
1 5.90047
2 78.8878
4 46.1369
8 24.6276
16 2.51928
32 1.05941
64 0.840319
100%
2 84.3879
4 23.0657
8 11.3583
16 5.07569
32 0.653742
64 4.5163
tryReclaim called 1/1024 iterations. All memory reclaimed at the end.
16 M objects
network_atomics=none
0%
1 4.78635
2 3.05447
4 1.31221
8 0.934623
16 0.83715
32 0.166274
64 0.116134
10%
1 5.24908
2 3.71027
4 1.68243
8 0.768322
16 0.344868
32 0.24501
64 0.162142
20%
1 4.80181
2 4.08498
4 1.69515
8 0.801024
16 0.431436
32 0.258051
64 0.17292
30%
1 4.7601
2 6.07566
4 2.08996
8 0.874534
16 0.771779
32 0.253027
64 0.185258
40%
1 4.85275
2 5.13869
4 2.98875
8 0.993134
16 0.408281
32 0.251876
64 0.194185
50%
1 5.08817
2 7.01981
4 2.87718
8 1.04423
16 0.410427
32 0.251934
64 0.197843
60%
1 4.39632
2 7.16872
4 3.17901
8 1.19805
16 1.45826
32 0.30802
64 0.207621
70%
1 4.77594
2 8.76361
4 3.34293
8 1.32065
16 0.45027
32 0.380607
64 0.20055
80%
1 4.51968
2 8.61471
4 4.36356
8 1.37182
16 0.486875
32 0.272599
64 0.216578
90%
1 4.44661
2 12.2626
4 4.72922
8 1.55849
16 0.496966
32 0.30799
64 0.241363
100%
2 9.74631
4 4.65984
8 1.09008
16 0.502708
32 0.328833
64 0.233762
network_atomics=ugni
0%
1 4.81969
2 2.90258
4 1.44132
8 0.777721
16 0.378633
32 0.171634
64 0.094303
10%
1 4.82819
2 3.51159
4 1.79551
8 0.896802
16 1.40071
32 0.252947
64 0.142686
20%
1 4.34284
2 4.307
4 1.94599
8 0.899056
16 1.3755
32 0.232183
64 0.149896
30%
1 4.58648
2 6.00885
4 2.43325
8 0.910832
16 1.43938
32 0.250706
64 0.167415
40%
1 4.91079
2 7.73908
4 2.8552
8 0.725935
16 0.387605
32 0.277521
64 0.160427
50%
1 5.11413
2 9.78018
4 3.55263
8 0.842866
16 0.467695
32 0.280016
64 0.165609
60%
1 4.61258
2 9.13228
4 3.18107
8 1.30995
16 0.502487
32 0.699681
64 0.163558
70%
1 4.96721
2 9.97369
4 3.86268
8 1.21525
16 0.426342
32 0.754343
64 0.198751
80%
1 4.38972
2 9.32722
4 4.44421
8 1.29743
16 1.5567
32 0.332515
64 0.179638
90%
1 4.80473
2 8.93565
4 4.34647
8 1.52931
16 0.615313
32 0.326735
64 0.224305
100%
2 13.2958
4 4.54318
8 1.02502
16 1.52756
32 0.773113
64 0.236703
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment