-
-
Save rygorous/153ea6493ce2efabd41c to your computer and use it in GitHub Desktop.
Code (and Windows binary) here: https://github.com/rygorous/atomic_ops_test | |
I'd appreciate it if some people could run this and post their results here in a | |
comment, with a short description of what CPU they're using. | |
UPDATE: I have recent Intel CPUs (released within the last 3 years) pretty well covered | |
by now, so if that's what you have in your machine, don't bother running the test. But | |
I'd love to get some more data points for older Intel CPUs and AMD parts! | |
Results so far: | |
---- | |
AMD Phenom II 925 (AMD 10h) @ 2.8GHz | |
interference type: none | |
add: 1.88 cycles/op | |
add_mfence: 36.80 cycles/op | |
lockadd: 18.01 cycles/op | |
xadd: 17.01 cycles/op | |
swap: 16.01 cycles/op | |
cmpxchg: 18.02 cycles/op | |
lockadd_unalign: 137.48 cycles/op | |
interference type: hyperthread_read_line | |
add: 13.87 cycles/op | |
add_mfence: 122.74 cycles/op | |
lockadd: 131.68 cycles/op | |
xadd: 17.01 cycles/op | |
swap: 16.01 cycles/op | |
cmpxchg: 18.02 cycles/op | |
lockadd_unalign: 345.44 cycles/op | |
interference type: hyperthread_write_line | |
add: 14.12 cycles/op | |
add_mfence: 132.87 cycles/op | |
lockadd: 114.37 cycles/op | |
xadd: 131.32 cycles/op | |
swap: 16.06 cycles/op | |
cmpxchg: 18.26 cycles/op | |
lockadd_unalign: 344.94 cycles/op | |
interference type: other_core_read_line | |
add: 1.88 cycles/op | |
add_mfence: 36.85 cycles/op | |
lockadd: 18.01 cycles/op | |
xadd: 17.01 cycles/op | |
swap: 16.01 cycles/op | |
cmpxchg: 18.01 cycles/op | |
lockadd_unalign: 136.15 cycles/op | |
interference type: other_core_write_line | |
add: 1.88 cycles/op | |
add_mfence: 37.07 cycles/op | |
lockadd: 18.01 cycles/op | |
xadd: 17.01 cycles/op | |
swap: 16.01 cycles/op | |
cmpxchg: 18.05 cycles/op | |
lockadd_unalign: 135.07 cycles/op | |
interference type: three_cores_read_line | |
add: 13.84 cycles/op | |
add_mfence: 119.52 cycles/op | |
lockadd: 130.07 cycles/op | |
xadd: 161.28 cycles/op | |
swap: 135.72 cycles/op | |
cmpxchg: 143.83 cycles/op | |
lockadd_unalign: 342.39 cycles/op | |
interference type: three_cores_write_line | |
add: 13.58 cycles/op | |
add_mfence: 130.97 cycles/op | |
lockadd: 18.05 cycles/op | |
xadd: 17.01 cycles/op | |
swap: 16.01 cycles/op | |
cmpxchg: 18.26 cycles/op | |
lockadd_unalign: 135.08 cycles/op | |
---- | |
AMD FX 8350 (Piledriver) @ 4.0GHz | |
interference type: none | |
add: 2.00 cycles/op | |
add_mfence: 94.95 cycles/op | |
lockadd: 43.18 cycles/op | |
xadd: 42.00 cycles/op | |
swap: 42.18 cycles/op | |
cmpxchg: 45.94 cycles/op | |
lockadd_unalign: 216.35 cycles/op | |
interference type: hyperthread_read_line | |
add: 2.00 cycles/op | |
add_mfence: 98.13 cycles/op | |
lockadd: 95.56 cycles/op | |
xadd: 93.84 cycles/op | |
swap: 93.82 cycles/op | |
cmpxchg: 94.22 cycles/op | |
lockadd_unalign: 274.09 cycles/op | |
interference type: hyperthread_write_line | |
add: 25.08 cycles/op | |
add_mfence: 177.02 cycles/op | |
lockadd: 142.13 cycles/op | |
xadd: 149.08 cycles/op | |
swap: 150.16 cycles/op | |
cmpxchg: 3697.73 cycles/op | |
lockadd_unalign: 259.23 cycles/op | |
interference type: other_core_read_line | |
add: 6.42 cycles/op | |
add_mfence: 393.58 cycles/op | |
lockadd: 216.35 cycles/op | |
xadd: 391.48 cycles/op | |
swap: 42.05 cycles/op | |
cmpxchg: 45.93 cycles/op | |
lockadd_unalign: 213.56 cycles/op | |
interference type: other_core_write_line | |
add: 2.00 cycles/op | |
add_mfence: 473.65 cycles/op | |
lockadd: 387.04 cycles/op | |
xadd: 378.94 cycles/op | |
swap: 396.20 cycles/op | |
cmpxchg: 942.90 cycles/op | |
lockadd_unalign: 655.72 cycles/op | |
interference type: three_cores_read_line | |
add: 13.53 cycles/op | |
add_mfence: 835.01 cycles/op | |
lockadd: 443.42 cycles/op | |
xadd: 580.89 cycles/op | |
swap: 834.23 cycles/op | |
cmpxchg: 1048.45 cycles/op | |
lockadd_unalign: 968.52 cycles/op | |
interference type: three_cores_write_line | |
add: 82.73 cycles/op | |
add_mfence: 905.74 cycles/op | |
lockadd: 825.68 cycles/op | |
xadd: 844.77 cycles/op | |
swap: 827.17 cycles/op | |
cmpxchg: 2491.23 cycles/op | |
lockadd_unalign: 1140.24 cycles/op | |
---- | |
Intel Atom D510 (1.66GHz, 2 core, 4 thread) on Linux (port here: https://github.com/maxburke/atomic_ops_test): | |
interference type: none | |
add: 2.88 cycles/op | |
add_mfence: 5.75 cycles/op | |
lockadd: 2.88 cycles/op | |
xadd: 7.83 cycles/op | |
swap: 7.67 cycles/op | |
cmpxchg: 21.50 cycles/op | |
lockadd_unalign: 181.44 cycles/op | |
interference type: hyperthread_read_line | |
add: 2.88 cycles/op | |
add_mfence: 5.75 cycles/op | |
lockadd: 2.88 cycles/op | |
xadd: 7.83 cycles/op | |
swap: 7.67 cycles/op | |
cmpxchg: 21.50 cycles/op | |
lockadd_unalign: 181.41 cycles/op | |
interference type: hyperthread_write_line | |
add: 5.00 cycles/op | |
add_mfence: 4.63 cycles/op | |
lockadd: 5.00 cycles/op | |
xadd: 7.75 cycles/op | |
swap: 7.25 cycles/op | |
cmpxchg: 24.02 cycles/op | |
lockadd_unalign: 180.20 cycles/op | |
interference type: other_core_read_line | |
add: 2.88 cycles/op | |
add_mfence: 5.75 cycles/op | |
lockadd: 2.88 cycles/op | |
xadd: 7.83 cycles/op | |
swap: 7.67 cycles/op | |
cmpxchg: 21.50 cycles/op | |
lockadd_unalign: 181.40 cycles/op | |
interference type: other_core_write_line | |
add: 36.12 cycles/op | |
add_mfence: 70.96 cycles/op | |
lockadd: 35.50 cycles/op | |
xadd: 96.41 cycles/op | |
swap: 68.43 cycles/op | |
cmpxchg: 348.55 cycles/op | |
lockadd_unalign: 209.81 cycles/op | |
interference type: three_cores_read_line | |
add: 2.88 cycles/op | |
add_mfence: 5.75 cycles/op | |
lockadd: 2.88 cycles/op | |
xadd: 7.83 cycles/op | |
swap: 7.67 cycles/op | |
cmpxchg: 21.50 cycles/op | |
lockadd_unalign: 185.16 cycles/op | |
interference type: three_cores_write_line | |
add: 36.19 cycles/op | |
add_mfence: 71.41 cycles/op | |
lockadd: 36.28 cycles/op | |
xadd: 96.52 cycles/op | |
swap: 68.35 cycles/op | |
cmpxchg: 344.29 cycles/op | |
lockadd_unalign: 209.89 cycles/op | |
---- | |
Intel Core i7-920 (Bloomfield [NHM derived]) | |
interference type: none | |
add: 2.74 cycles/op | |
add_mfence: 43.10 cycles/op | |
lockadd: 20.63 cycles/op | |
xadd: 20.70 cycles/op | |
swap: 20.72 cycles/op | |
cmpxchg: 17.74 cycles/op | |
lockadd_unalign: 1218.97 cycles/op | |
interference type: hyperthread_read_line | |
add: 2.05 cycles/op | |
add_mfence: 43.85 cycles/op | |
lockadd: 21.61 cycles/op | |
xadd: 21.43 cycles/op | |
swap: 20.04 cycles/op | |
cmpxchg: 20.89 cycles/op | |
lockadd_unalign: 1216.35 cycles/op | |
interference type: hyperthread_write_line | |
add: 4.58 cycles/op | |
add_mfence: 36.71 cycles/op | |
lockadd: 22.11 cycles/op | |
xadd: 21.99 cycles/op | |
swap: 22.36 cycles/op | |
cmpxchg: 44.44 cycles/op | |
lockadd_unalign: 1236.09 cycles/op | |
interference type: other_core_read_line | |
add: 3.65 cycles/op | |
add_mfence: 156.83 cycles/op | |
lockadd: 85.56 cycles/op | |
xadd: 83.32 cycles/op | |
swap: 83.72 cycles/op | |
cmpxchg: 141.47 cycles/op | |
lockadd_unalign: 1216.30 cycles/op | |
interference type: other_core_write_line | |
add: 4.66 cycles/op | |
add_mfence: 117.53 cycles/op | |
lockadd: 63.93 cycles/op | |
xadd: 61.09 cycles/op | |
swap: 61.17 cycles/op | |
cmpxchg: 145.92 cycles/op | |
lockadd_unalign: 1224.23 cycles/op | |
interference type: three_cores_read_line | |
add: 3.71 cycles/op | |
add_mfence: 227.06 cycles/op | |
lockadd: 138.89 cycles/op | |
xadd: 130.12 cycles/op | |
swap: 133.46 cycles/op | |
cmpxchg: 214.04 cycles/op | |
lockadd_unalign: 1234.76 cycles/op | |
interference type: three_cores_write_line | |
add: 5.51 cycles/op | |
add_mfence: 211.07 cycles/op | |
lockadd: 121.84 cycles/op | |
xadd: 120.34 cycles/op | |
swap: 119.18 cycles/op | |
cmpxchg: 227.27 cycles/op | |
lockadd_unalign: 1265.16 cycles/op | |
---- | |
Intel Core i3-2310M (Sandy Bridge) @ 2.1GHz | |
interference type: none | |
add: 2.08 cycles/op | |
add_mfence: 51.03 cycles/op | |
lockadd: 22.48 cycles/op | |
xadd: 22.58 cycles/op | |
swap: 22.84 cycles/op | |
cmpxchg: 23.22 cycles/op | |
lockadd_unalign: 571.91 cycles/op | |
interference type: hyperthread_read_line | |
add: 2.12 cycles/op | |
add_mfence: 51.49 cycles/op | |
lockadd: 24.57 cycles/op | |
xadd: 36.03 cycles/op | |
swap: 29.83 cycles/op | |
cmpxchg: 30.42 cycles/op | |
lockadd_unalign: 597.80 cycles/op | |
interference type: hyperthread_write_line | |
add: 6.83 cycles/op | |
add_mfence: 55.87 cycles/op | |
lockadd: 31.54 cycles/op | |
xadd: 34.54 cycles/op | |
swap: 35.70 cycles/op | |
cmpxchg: 114.43 cycles/op | |
lockadd_unalign: 585.72 cycles/op | |
interference type: other_core_read_line | |
add: 2.09 cycles/op | |
add_mfence: 56.78 cycles/op | |
lockadd: 110.71 cycles/op | |
xadd: 107.70 cycles/op | |
swap: 26.68 cycles/op | |
cmpxchg: 115.62 cycles/op | |
lockadd_unalign: 570.43 cycles/op | |
interference type: other_core_write_line | |
add: 4.52 cycles/op | |
add_mfence: 51.98 cycles/op | |
lockadd: 24.85 cycles/op | |
xadd: 22.84 cycles/op | |
swap: 22.94 cycles/op | |
cmpxchg: 23.27 cycles/op | |
lockadd_unalign: 597.92 cycles/op | |
interference type: three_cores_read_line | |
add: 2.65 cycles/op | |
add_mfence: 107.83 cycles/op | |
lockadd: 99.94 cycles/op | |
xadd: 108.54 cycles/op | |
swap: 98.83 cycles/op | |
cmpxchg: 114.97 cycles/op | |
lockadd_unalign: 589.32 cycles/op | |
interference type: three_cores_write_line | |
add: 4.52 cycles/op | |
add_mfence: 178.42 cycles/op | |
lockadd: 52.71 cycles/op | |
xadd: 151.44 cycles/op | |
swap: 133.44 cycles/op | |
cmpxchg: 23.26 cycles/op | |
lockadd_unalign: 577.85 cycles/op | |
---- | |
Intel Core i5-2400 (Sandy Bridge) @ 3.10GHz | |
interference type: none | |
add: 1.69 cycles/op | |
add_mfence: 44.08 cycles/op | |
lockadd: 25.19 cycles/op | |
xadd: 25.19 cycles/op | |
swap: 24.46 cycles/op | |
cmpxchg: 25.19 cycles/op | |
lockadd_unalign: 615.60 cycles/op | |
interference type: hyperthread_read_line | |
add: 2.23 cycles/op | |
add_mfence: 66.85 cycles/op | |
lockadd: 25.19 cycles/op | |
xadd: 25.19 cycles/op | |
swap: 24.50 cycles/op | |
cmpxchg: 87.92 cycles/op | |
lockadd_unalign: 613.58 cycles/op | |
interference type: hyperthread_write_line | |
add: 1.69 cycles/op | |
add_mfence: 44.73 cycles/op | |
lockadd: 25.19 cycles/op | |
xadd: 25.19 cycles/op | |
swap: 24.46 cycles/op | |
cmpxchg: 25.19 cycles/op | |
lockadd_unalign: 614.62 cycles/op | |
interference type: other_core_read_line | |
add: 2.22 cycles/op | |
add_mfence: 44.56 cycles/op | |
lockadd: 25.19 cycles/op | |
xadd: 25.22 cycles/op | |
swap: 24.46 cycles/op | |
cmpxchg: 25.23 cycles/op | |
lockadd_unalign: 614.30 cycles/op | |
interference type: other_core_write_line | |
add: 1.69 cycles/op | |
add_mfence: 44.63 cycles/op | |
lockadd: 159.29 cycles/op | |
xadd: 25.19 cycles/op | |
swap: 24.46 cycles/op | |
cmpxchg: 25.19 cycles/op | |
lockadd_unalign: 614.38 cycles/op | |
interference type: three_cores_read_line | |
add: 2.23 cycles/op | |
add_mfence: 113.23 cycles/op | |
lockadd: 108.38 cycles/op | |
xadd: 108.40 cycles/op | |
swap: 108.28 cycles/op | |
cmpxchg: 25.19 cycles/op | |
lockadd_unalign: 612.58 cycles/op | |
interference type: three_cores_write_line | |
add: 3.56 cycles/op | |
add_mfence: 44.53 cycles/op | |
lockadd: 25.90 cycles/op | |
xadd: 160.24 cycles/op | |
swap: 142.26 cycles/op | |
cmpxchg: 25.23 cycles/op | |
lockadd_unalign: 616.72 cycles/op | |
---- | |
Intel Core i7-2677M (Sandy Bridge) @ 1.80 GHz | |
interference type: none | |
add: 1.51 cycles/op | |
add_mfence: 35.09 cycles/op | |
lockadd: 15.48 cycles/op | |
xadd: 15.32 cycles/op | |
swap: 15.78 cycles/op | |
cmpxchg: 15.99 cycles/op | |
lockadd_unalign: 410.44 cycles/op | |
interference type: hyperthread_read_line | |
add: 1.49 cycles/op | |
add_mfence: 34.68 cycles/op | |
lockadd: 15.48 cycles/op | |
xadd: 21.78 cycles/op | |
swap: 24.98 cycles/op | |
cmpxchg: 15.56 cycles/op | |
lockadd_unalign: 450.96 cycles/op | |
interference type: hyperthread_write_line | |
add: 1.51 cycles/op | |
add_mfence: 35.05 cycles/op | |
lockadd: 15.50 cycles/op | |
xadd: 15.53 cycles/op | |
swap: 15.81 cycles/op | |
cmpxchg: 15.94 cycles/op | |
lockadd_unalign: 408.41 cycles/op | |
interference type: other_core_read_line | |
add: 1.51 cycles/op | |
add_mfence: 35.07 cycles/op | |
lockadd: 15.48 cycles/op | |
xadd: 15.33 cycles/op | |
swap: 15.78 cycles/op | |
cmpxchg: 15.98 cycles/op | |
lockadd_unalign: 408.03 cycles/op | |
interference type: other_core_write_line | |
add: 3.35 cycles/op | |
add_mfence: 118.20 cycles/op | |
lockadd: 100.99 cycles/op | |
xadd: 103.57 cycles/op | |
swap: 106.19 cycles/op | |
cmpxchg: 251.31 cycles/op | |
lockadd_unalign: 405.65 cycles/op | |
interference type: three_cores_read_line | |
add: 1.85 cycles/op | |
add_mfence: 70.64 cycles/op | |
lockadd: 73.13 cycles/op | |
xadd: 73.48 cycles/op | |
swap: 68.97 cycles/op | |
cmpxchg: 71.28 cycles/op | |
lockadd_unalign: 439.66 cycles/op | |
interference type: three_cores_write_line | |
add: 3.36 cycles/op | |
add_mfence: 121.96 cycles/op | |
lockadd: 90.12 cycles/op | |
xadd: 91.66 cycles/op | |
swap: 90.41 cycles/op | |
cmpxchg: 230.79 cycles/op | |
lockadd_unalign: 405.44 cycles/op | |
---- | |
Intel Core i7-2600K (Sandy Bridge) @ 3.4GHz | |
interference type: none | |
add: 2.11 cycles/op | |
add_mfence: 50.19 cycles/op | |
lockadd: 22.32 cycles/op | |
xadd: 22.22 cycles/op | |
swap: 22.53 cycles/op | |
cmpxchg: 22.86 cycles/op | |
lockadd_unalign: 648.10 cycles/op | |
interference type: hyperthread_read_line | |
add: 2.12 cycles/op | |
add_mfence: 50.24 cycles/op | |
lockadd: 32.74 cycles/op | |
xadd: 39.23 cycles/op | |
swap: 29.51 cycles/op | |
cmpxchg: 29.36 cycles/op | |
lockadd_unalign: 682.59 cycles/op | |
interference type: hyperthread_write_line | |
add: 6.97 cycles/op | |
add_mfence: 54.63 cycles/op | |
lockadd: 53.94 cycles/op | |
xadd: 36.98 cycles/op | |
swap: 35.85 cycles/op | |
cmpxchg: 131.69 cycles/op | |
lockadd_unalign: 652.76 cycles/op | |
interference type: other_core_read_line | |
add: 2.62 cycles/op | |
add_mfence: 103.76 cycles/op | |
lockadd: 108.31 cycles/op | |
xadd: 108.12 cycles/op | |
swap: 101.97 cycles/op | |
cmpxchg: 113.30 cycles/op | |
lockadd_unalign: 648.32 cycles/op | |
interference type: other_core_write_line | |
add: 4.50 cycles/op | |
add_mfence: 171.69 cycles/op | |
lockadd: 139.92 cycles/op | |
xadd: 140.46 cycles/op | |
swap: 146.66 cycles/op | |
cmpxchg: 360.81 cycles/op | |
lockadd_unalign: 647.92 cycles/op | |
interference type: three_cores_read_line | |
add: 2.72 cycles/op | |
add_mfence: 123.66 cycles/op | |
lockadd: 134.83 cycles/op | |
xadd: 134.35 cycles/op | |
swap: 132.47 cycles/op | |
cmpxchg: 136.49 cycles/op | |
lockadd_unalign: 646.96 cycles/op | |
interference type: three_cores_write_line | |
add: 11.21 cycles/op | |
add_mfence: 412.56 cycles/op | |
lockadd: 331.98 cycles/op | |
xadd: 337.59 cycles/op | |
swap: 383.45 cycles/op | |
cmpxchg: 5916.89 cycles/op | |
lockadd_unalign: 733.50 cycles/op | |
---- | |
Intel Core i5-3427U (Ivy Bridge) @ 1.8Ghz | |
interference type: none | |
add: 1.82 cycles/op | |
add_mfence: 45.40 cycles/op | |
lockadd: 19.11 cycles/op | |
xadd: 19.09 cycles/op | |
swap: 19.30 cycles/op | |
cmpxchg: 18.99 cycles/op | |
lockadd_unalign: 628.82 cycles/op | |
interference type: hyperthread_read_line | |
add: 1.85 cycles/op | |
add_mfence: 45.42 cycles/op | |
lockadd: 19.13 cycles/op | |
xadd: 19.07 cycles/op | |
swap: 19.41 cycles/op | |
cmpxchg: 19.01 cycles/op | |
lockadd_unalign: 621.71 cycles/op | |
interference type: hyperthread_write_line | |
add: 1.82 cycles/op | |
add_mfence: 45.63 cycles/op | |
lockadd: 32.25 cycles/op | |
xadd: 19.08 cycles/op | |
swap: 19.41 cycles/op | |
cmpxchg: 19.01 cycles/op | |
lockadd_unalign: 613.85 cycles/op | |
interference type: other_core_read_line | |
add: 2.28 cycles/op | |
add_mfence: 87.28 cycles/op | |
lockadd: 21.03 cycles/op | |
xadd: 19.07 cycles/op | |
swap: 19.34 cycles/op | |
cmpxchg: 19.02 cycles/op | |
lockadd_unalign: 621.71 cycles/op | |
interference type: other_core_write_line | |
add: 1.82 cycles/op | |
add_mfence: 45.41 cycles/op | |
lockadd: 19.10 cycles/op | |
xadd: 19.05 cycles/op | |
swap: 19.30 cycles/op | |
cmpxchg: 18.99 cycles/op | |
lockadd_unalign: 628.41 cycles/op | |
interference type: three_cores_read_line | |
add: 2.28 cycles/op | |
add_mfence: 97.28 cycles/op | |
lockadd: 96.07 cycles/op | |
xadd: 96.23 cycles/op | |
swap: 94.17 cycles/op | |
cmpxchg: 96.80 cycles/op | |
lockadd_unalign: 619.27 cycles/op | |
interference type: three_cores_write_line | |
add: 4.29 cycles/op | |
add_mfence: 154.34 cycles/op | |
lockadd: 19.27 cycles/op | |
xadd: 19.15 cycles/op | |
swap: 19.36 cycles/op | |
cmpxchg: 19.01 cycles/op | |
lockadd_unalign: 645.76 cycles/op | |
---- | |
Intel Core i7-3770K (Ivy Bridge) @ 3.5GHz | |
interference type: none | |
add: 1.76 cycles/op | |
add_mfence: 44.08 cycles/op | |
lockadd: 18.46 cycles/op | |
xadd: 18.40 cycles/op | |
swap: 18.45 cycles/op | |
cmpxchg: 18.32 cycles/op | |
lockadd_unalign: 661.34 cycles/op | |
interference type: hyperthread_read_line | |
add: 1.78 cycles/op | |
add_mfence: 43.83 cycles/op | |
lockadd: 113.62 cycles/op | |
xadd: 121.86 cycles/op | |
swap: 47.23 cycles/op | |
cmpxchg: 48.57 cycles/op | |
lockadd_unalign: 612.54 cycles/op | |
interference type: hyperthread_write_line | |
add: 6.45 cycles/op | |
add_mfence: 48.10 cycles/op | |
lockadd: 32.62 cycles/op | |
xadd: 31.40 cycles/op | |
swap: 34.96 cycles/op | |
cmpxchg: 121.29 cycles/op | |
lockadd_unalign: 647.97 cycles/op | |
interference type: other_core_read_line | |
add: 2.19 cycles/op | |
add_mfence: 88.60 cycles/op | |
lockadd: 76.83 cycles/op | |
xadd: 77.17 cycles/op | |
swap: 74.09 cycles/op | |
cmpxchg: 79.80 cycles/op | |
lockadd_unalign: 659.39 cycles/op | |
interference type: other_core_write_line | |
add: 3.85 cycles/op | |
add_mfence: 146.39 cycles/op | |
lockadd: 128.53 cycles/op | |
xadd: 126.04 cycles/op | |
swap: 117.97 cycles/op | |
cmpxchg: 294.79 cycles/op | |
lockadd_unalign: 691.28 cycles/op | |
interference type: three_cores_read_line | |
add: 2.27 cycles/op | |
add_mfence: 106.34 cycles/op | |
lockadd: 109.30 cycles/op | |
xadd: 109.28 cycles/op | |
swap: 112.08 cycles/op | |
cmpxchg: 109.78 cycles/op | |
lockadd_unalign: 674.16 cycles/op | |
interference type: three_cores_write_line | |
add: 13.57 cycles/op | |
add_mfence: 329.85 cycles/op | |
lockadd: 248.26 cycles/op | |
xadd: 270.56 cycles/op | |
swap: 267.62 cycles/op | |
cmpxchg: 3827.60 cycles/op | |
lockadd_unalign: 780.82 cycles/op | |
---- | |
Intel Core i5-4460 (Haswell) 3.20GHz | |
interference type: none | |
add: 1.41 cycles/op | |
add_mfence: 43.45 cycles/op | |
lockadd: 22.59 cycles/op | |
xadd: 21.18 cycles/op | |
swap: 21.88 cycles/op | |
cmpxchg: 21.18 cycles/op | |
lockadd_unalign: 586.52 cycles/op | |
interference type: hyperthread_read_line | |
add: 1.41 cycles/op | |
add_mfence: 43.82 cycles/op | |
lockadd: 22.63 cycles/op | |
xadd: 21.56 cycles/op | |
swap: 105.71 cycles/op | |
cmpxchg: 21.56 cycles/op | |
lockadd_unalign: 583.86 cycles/op | |
interference type: hyperthread_write_line | |
add: 2.62 cycles/op | |
add_mfence: 43.70 cycles/op | |
lockadd: 88.30 cycles/op | |
xadd: 21.18 cycles/op | |
swap: 21.88 cycles/op | |
cmpxchg: 21.18 cycles/op | |
lockadd_unalign: 585.65 cycles/op | |
interference type: other_core_read_line | |
add: 1.42 cycles/op | |
add_mfence: 44.93 cycles/op | |
lockadd: 94.54 cycles/op | |
xadd: 106.89 cycles/op | |
swap: 22.39 cycles/op | |
cmpxchg: 22.20 cycles/op | |
lockadd_unalign: 582.73 cycles/op | |
interference type: other_core_write_line | |
add: 2.40 cycles/op | |
add_mfence: 47.28 cycles/op | |
lockadd: 117.44 cycles/op | |
xadd: 21.23 cycles/op | |
swap: 22.07 cycles/op | |
cmpxchg: 21.18 cycles/op | |
lockadd_unalign: 585.65 cycles/op | |
interference type: three_cores_read_line | |
add: 1.90 cycles/op | |
add_mfence: 119.15 cycles/op | |
lockadd: 104.99 cycles/op | |
xadd: 105.29 cycles/op | |
swap: 115.61 cycles/op | |
cmpxchg: 107.39 cycles/op | |
lockadd_unalign: 582.31 cycles/op | |
interference type: three_cores_write_line | |
add: 2.59 cycles/op | |
add_mfence: 47.06 cycles/op | |
lockadd: 22.64 cycles/op | |
xadd: 118.95 cycles/op | |
swap: 118.90 cycles/op | |
cmpxchg: 21.57 cycles/op | |
lockadd_unalign: 584.93 cycles/op | |
---- | |
Intel Core i5-4670K (Haswell), stock clocks [i.e. 3.4GHz], turbos to 4GHz on all 4 cores during run: | |
interference type: none | |
add: 1.28 cycles/op | |
add_mfence: 39.19 cycles/op | |
lockadd: 20.43 cycles/op | |
xadd: 19.13 cycles/op | |
swap: 19.76 cycles/op | |
cmpxchg: 19.13 cycles/op | |
lockadd_unalign: 574.70 cycles/op | |
interference type: hyperthread_read_line | |
add: 1.64 cycles/op | |
add_mfence: 102.07 cycles/op | |
lockadd: 69.59 cycles/op | |
xadd: 19.17 cycles/op | |
swap: 19.80 cycles/op | |
cmpxchg: 19.27 cycles/op | |
lockadd_unalign: 574.58 cycles/op | |
interference type: hyperthread_write_line | |
add: 1.28 cycles/op | |
add_mfence: 39.19 cycles/op | |
lockadd: 20.43 cycles/op | |
xadd: 19.13 cycles/op | |
swap: 19.76 cycles/op | |
cmpxchg: 19.13 cycles/op | |
lockadd_unalign: 575.42 cycles/op | |
interference type: other_core_read_line | |
add: 1.28 cycles/op | |
add_mfence: 39.29 cycles/op | |
lockadd: 20.44 cycles/op | |
xadd: 19.16 cycles/op | |
swap: 19.80 cycles/op | |
cmpxchg: 19.14 cycles/op | |
lockadd_unalign: 575.71 cycles/op | |
interference type: other_core_write_line | |
add: 1.28 cycles/op | |
add_mfence: 39.19 cycles/op | |
lockadd: 20.40 cycles/op | |
xadd: 19.16 cycles/op | |
swap: 19.76 cycles/op | |
cmpxchg: 19.13 cycles/op | |
lockadd_unalign: 578.42 cycles/op | |
interference type: three_cores_read_line | |
add: 1.28 cycles/op | |
add_mfence: 39.20 cycles/op | |
lockadd: 20.44 cycles/op | |
xadd: 19.16 cycles/op | |
swap: 19.80 cycles/op | |
cmpxchg: 19.16 cycles/op | |
lockadd_unalign: 573.75 cycles/op | |
interference type: three_cores_write_line | |
add: 2.66 cycles/op | |
add_mfence: 130.16 cycles/op | |
lockadd: 97.48 cycles/op | |
xadd: 118.87 cycles/op | |
swap: 66.55 cycles/op | |
cmpxchg: 270.15 cycles/op | |
lockadd_unalign: 575.15 cycles/op | |
---- | |
Intel Core i7-4770 (Haswell) CPU @ 3.40GHz | |
interference type: none | |
add: 1.77 cycles/op | |
add_mfence: 46.09 cycles/op | |
lockadd: 19.41 cycles/op | |
xadd: 20.31 cycles/op | |
swap: 22.08 cycles/op | |
cmpxchg: 19.67 cycles/op | |
lockadd_unalign: 704.17 cycles/op | |
interference type: hyperthread_read_line | |
add: 1.78 cycles/op | |
add_mfence: 46.32 cycles/op | |
lockadd: 18.06 cycles/op | |
xadd: 18.07 cycles/op | |
swap: 21.56 cycles/op | |
cmpxchg: 17.87 cycles/op | |
lockadd_unalign: 655.86 cycles/op | |
interference type: hyperthread_write_line | |
add: 10.06 cycles/op | |
add_mfence: 50.14 cycles/op | |
lockadd: 47.36 cycles/op | |
xadd: 39.90 cycles/op | |
swap: 30.60 cycles/op | |
cmpxchg: 111.52 cycles/op | |
lockadd_unalign: 817.70 cycles/op | |
interference type: other_core_read_line | |
add: 2.07 cycles/op | |
add_mfence: 103.08 cycles/op | |
lockadd: 113.44 cycles/op | |
xadd: 114.60 cycles/op | |
swap: 107.72 cycles/op | |
cmpxchg: 106.50 cycles/op | |
lockadd_unalign: 715.33 cycles/op | |
interference type: other_core_write_line | |
add: 3.12 cycles/op | |
add_mfence: 174.72 cycles/op | |
lockadd: 119.09 cycles/op | |
xadd: 119.09 cycles/op | |
swap: 116.53 cycles/op | |
cmpxchg: 376.28 cycles/op | |
lockadd_unalign: 742.58 cycles/op | |
interference type: three_cores_read_line | |
add: 2.08 cycles/op | |
add_mfence: 121.34 cycles/op | |
lockadd: 163.34 cycles/op | |
xadd: 162.96 cycles/op | |
swap: 161.23 cycles/op | |
cmpxchg: 151.60 cycles/op | |
lockadd_unalign: 741.54 cycles/op | |
interference type: three_cores_write_line | |
add: 7.59 cycles/op | |
add_mfence: 314.44 cycles/op | |
lockadd: 238.19 cycles/op | |
xadd: 238.33 cycles/op | |
swap: 237.93 cycles/op | |
cmpxchg: 2635.58 cycles/op | |
lockadd_unalign: 833.31 cycles/op | |
Intel Core i5-3427U (Ivy Bridge) @ 1.8Ghz
interference type: none
add: 1.82 cycles/op
add_mfence: 45.40 cycles/op
lockadd: 19.11 cycles/op
xadd: 19.09 cycles/op
swap: 19.30 cycles/op
cmpxchg: 18.99 cycles/op
lockadd_unalign: 628.82 cycles/op
interference type: hyperthread_read_line
add: 1.85 cycles/op
add_mfence: 45.42 cycles/op
lockadd: 19.13 cycles/op
xadd: 19.07 cycles/op
swap: 19.41 cycles/op
cmpxchg: 19.01 cycles/op
lockadd_unalign: 621.71 cycles/op
interference type: hyperthread_write_line
add: 1.82 cycles/op
add_mfence: 45.63 cycles/op
lockadd: 32.25 cycles/op
xadd: 19.08 cycles/op
swap: 19.41 cycles/op
cmpxchg: 19.01 cycles/op
lockadd_unalign: 613.85 cycles/op
interference type: other_core_read_line
add: 2.28 cycles/op
add_mfence: 87.28 cycles/op
lockadd: 21.03 cycles/op
xadd: 19.07 cycles/op
swap: 19.34 cycles/op
cmpxchg: 19.02 cycles/op
lockadd_unalign: 621.71 cycles/op
interference type: other_core_write_line
add: 1.82 cycles/op
add_mfence: 45.41 cycles/op
lockadd: 19.10 cycles/op
xadd: 19.05 cycles/op
swap: 19.30 cycles/op
cmpxchg: 18.99 cycles/op
lockadd_unalign: 628.41 cycles/op
interference type: three_cores_read_line
add: 2.28 cycles/op
add_mfence: 97.28 cycles/op
lockadd: 96.07 cycles/op
xadd: 96.23 cycles/op
swap: 94.17 cycles/op
cmpxchg: 96.80 cycles/op
lockadd_unalign: 619.27 cycles/op
interference type: three_cores_write_line
add: 4.29 cycles/op
add_mfence: 154.34 cycles/op
lockadd: 19.27 cycles/op
xadd: 19.15 cycles/op
swap: 19.36 cycles/op
cmpxchg: 19.01 cycles/op
lockadd_unalign: 645.76 cycles/op
Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
interference type: none
add: 1.77 cycles/op
add_mfence: 46.18 cycles/op
lockadd: 19.43 cycles/op
xadd: 20.21 cycles/op
swap: 22.23 cycles/op
cmpxchg: 19.70 cycles/op
lockadd_unalign: 718.28 cycles/op
interference type: hyperthread_read_line
add: 1.78 cycles/op
add_mfence: 46.32 cycles/op
lockadd: 18.03 cycles/op
xadd: 18.06 cycles/op
swap: 21.55 cycles/op
cmpxchg: 17.87 cycles/op
lockadd_unalign: 650.22 cycles/op
interference type: hyperthread_write_line
add: 10.04 cycles/op
add_mfence: 50.16 cycles/op
lockadd: 47.20 cycles/op
xadd: 39.77 cycles/op
swap: 30.49 cycles/op
cmpxchg: 112.13 cycles/op
lockadd_unalign: 813.49 cycles/op
interference type: other_core_read_line
add: 2.01 cycles/op
add_mfence: 102.74 cycles/op
lockadd: 105.58 cycles/op
xadd: 107.39 cycles/op
swap: 108.68 cycles/op
cmpxchg: 104.17 cycles/op
lockadd_unalign: 707.72 cycles/op
interference type: other_core_write_line
add: 3.11 cycles/op
add_mfence: 175.14 cycles/op
lockadd: 114.47 cycles/op
xadd: 119.80 cycles/op
swap: 117.18 cycles/op
cmpxchg: 28.35 cycles/op
lockadd_unalign: 721.09 cycles/op
interference type: three_cores_read_line
add: 2.10 cycles/op
add_mfence: 121.41 cycles/op
lockadd: 162.92 cycles/op
xadd: 162.67 cycles/op
swap: 156.57 cycles/op
cmpxchg: 151.76 cycles/op
lockadd_unalign: 722.32 cycles/op
interference type: three_cores_write_line
add: 7.46 cycles/op
add_mfence: 313.86 cycles/op
lockadd: 238.67 cycles/op
xadd: 237.84 cycles/op
swap: 238.59 cycles/op
cmpxchg: 2319.80 cycles/op
lockadd_unalign: 827.18 cycles/op
i5 4670K, stock clocks, turbos to 4GHz on all 4 cores during run:
interference type: none
add: 1.28 cycles/op
add_mfence: 39.19 cycles/op
lockadd: 20.43 cycles/op
xadd: 19.13 cycles/op
swap: 19.76 cycles/op
cmpxchg: 19.13 cycles/op
lockadd_unalign: 574.70 cycles/op
interference type: hyperthread_read_line
add: 1.64 cycles/op
add_mfence: 102.07 cycles/op
lockadd: 69.59 cycles/op
xadd: 19.17 cycles/op
swap: 19.80 cycles/op
cmpxchg: 19.27 cycles/op
lockadd_unalign: 574.58 cycles/op
interference type: hyperthread_write_line
add: 1.28 cycles/op
add_mfence: 39.19 cycles/op
lockadd: 20.43 cycles/op
xadd: 19.13 cycles/op
swap: 19.76 cycles/op
cmpxchg: 19.13 cycles/op
lockadd_unalign: 575.42 cycles/op
interference type: other_core_read_line
add: 1.28 cycles/op
add_mfence: 39.29 cycles/op
lockadd: 20.44 cycles/op
xadd: 19.16 cycles/op
swap: 19.80 cycles/op
cmpxchg: 19.14 cycles/op
lockadd_unalign: 575.71 cycles/op
interference type: other_core_write_line
add: 1.28 cycles/op
add_mfence: 39.19 cycles/op
lockadd: 20.40 cycles/op
xadd: 19.16 cycles/op
swap: 19.76 cycles/op
cmpxchg: 19.13 cycles/op
lockadd_unalign: 578.42 cycles/op
interference type: three_cores_read_line
add: 1.28 cycles/op
add_mfence: 39.20 cycles/op
lockadd: 20.44 cycles/op
xadd: 19.16 cycles/op
swap: 19.80 cycles/op
cmpxchg: 19.16 cycles/op
lockadd_unalign: 573.75 cycles/op
interference type: three_cores_write_line
add: 2.66 cycles/op
add_mfence: 130.16 cycles/op
lockadd: 97.48 cycles/op
xadd: 118.87 cycles/op
swap: 66.55 cycles/op
cmpxchg: 270.15 cycles/op
lockadd_unalign: 575.15 cycles/op
Intel Core i7-920 (Bloomfield) @ 2.8 GHz (during execution)
interference type: none
add: 2.74 cycles/op
add_mfence: 43.10 cycles/op
lockadd: 20.63 cycles/op
xadd: 20.70 cycles/op
swap: 20.72 cycles/op
cmpxchg: 17.74 cycles/op
lockadd_unalign: 1218.97 cycles/op
interference type: hyperthread_read_line
add: 2.05 cycles/op
add_mfence: 43.85 cycles/op
lockadd: 21.61 cycles/op
xadd: 21.43 cycles/op
swap: 20.04 cycles/op
cmpxchg: 20.89 cycles/op
lockadd_unalign: 1216.35 cycles/op
interference type: hyperthread_write_line
add: 4.58 cycles/op
add_mfence: 36.71 cycles/op
lockadd: 22.11 cycles/op
xadd: 21.99 cycles/op
swap: 22.36 cycles/op
cmpxchg: 44.44 cycles/op
lockadd_unalign: 1236.09 cycles/op
interference type: other_core_read_line
add: 3.65 cycles/op
add_mfence: 156.83 cycles/op
lockadd: 85.56 cycles/op
xadd: 83.32 cycles/op
swap: 83.72 cycles/op
cmpxchg: 141.47 cycles/op
lockadd_unalign: 1216.30 cycles/op
interference type: other_core_write_line
add: 4.66 cycles/op
add_mfence: 117.53 cycles/op
lockadd: 63.93 cycles/op
xadd: 61.09 cycles/op
swap: 61.17 cycles/op
cmpxchg: 145.92 cycles/op
lockadd_unalign: 1224.23 cycles/op
interference type: three_cores_read_line
add: 3.71 cycles/op
add_mfence: 227.06 cycles/op
lockadd: 138.89 cycles/op
xadd: 130.12 cycles/op
swap: 133.46 cycles/op
cmpxchg: 214.04 cycles/op
lockadd_unalign: 1234.76 cycles/op
interference type: three_cores_write_line
add: 5.51 cycles/op
add_mfence: 211.07 cycles/op
lockadd: 121.84 cycles/op
xadd: 120.34 cycles/op
swap: 119.18 cycles/op
cmpxchg: 227.27 cycles/op
lockadd_unalign: 1265.16 cycles/op
Intel Core i7-2677M @1.80 GHz 2C4T (Sandy Bridge, so rdtsc fixed freq despite turbo. Didn't set bios to disable turbo etc. for fixed freq so results variable with freq.)
interference type: none
add: 1.51 cycles/op
add_mfence: 35.09 cycles/op
lockadd: 15.48 cycles/op
xadd: 15.32 cycles/op
swap: 15.78 cycles/op
cmpxchg: 15.99 cycles/op
lockadd_unalign: 410.44 cycles/op
interference type: hyperthread_read_line
add: 1.49 cycles/op
add_mfence: 34.68 cycles/op
lockadd: 15.48 cycles/op
xadd: 21.78 cycles/op
swap: 24.98 cycles/op
cmpxchg: 15.56 cycles/op
lockadd_unalign: 450.96 cycles/op
interference type: hyperthread_write_line
add: 1.51 cycles/op
add_mfence: 35.05 cycles/op
lockadd: 15.50 cycles/op
xadd: 15.53 cycles/op
swap: 15.81 cycles/op
cmpxchg: 15.94 cycles/op
lockadd_unalign: 408.41 cycles/op
interference type: other_core_read_line
add: 1.51 cycles/op
add_mfence: 35.07 cycles/op
lockadd: 15.48 cycles/op
xadd: 15.33 cycles/op
swap: 15.78 cycles/op
cmpxchg: 15.98 cycles/op
lockadd_unalign: 408.03 cycles/op
interference type: other_core_write_line
add: 3.35 cycles/op
add_mfence: 118.20 cycles/op
lockadd: 100.99 cycles/op
xadd: 103.57 cycles/op
swap: 106.19 cycles/op
cmpxchg: 251.31 cycles/op
lockadd_unalign: 405.65 cycles/op
interference type: three_cores_read_line
add: 1.85 cycles/op
add_mfence: 70.64 cycles/op
lockadd: 73.13 cycles/op
xadd: 73.48 cycles/op
swap: 68.97 cycles/op
cmpxchg: 71.28 cycles/op
lockadd_unalign: 439.66 cycles/op
interference type: three_cores_write_line
add: 3.36 cycles/op
add_mfence: 121.96 cycles/op
lockadd: 90.12 cycles/op
xadd: 91.66 cycles/op
swap: 90.41 cycles/op
cmpxchg: 230.79 cycles/op
lockadd_unalign: 405.44 cycles/op
Second run:
Intel Core i7-2677M @1.80 GHz 2C4T (Sandy Bridge, so rdtsc fixed freq despite turbo. Didn't set bios to disable turbo etc. for fixed freq so results variable with freq.).
interference type: none
add: 1.51 cycles/op
add_mfence: 34.79 cycles/op
lockadd: 15.64 cycles/op
xadd: 15.53 cycles/op
swap: 15.85 cycles/op
cmpxchg: 15.95 cycles/op
lockadd_unalign: 418.83 cycles/op
interference type: hyperthread_read_line
add: 1.51 cycles/op
add_mfence: 34.72 cycles/op
lockadd: 19.57 cycles/op
xadd: 15.51 cycles/op
swap: 15.94 cycles/op
cmpxchg: 16.09 cycles/op
lockadd_unalign: 426.44 cycles/op
interference type: hyperthread_write_line
add: 3.44 cycles/op
add_mfence: 37.77 cycles/op
lockadd: 17.14 cycles/op
xadd: 19.51 cycles/op
swap: 20.60 cycles/op
cmpxchg: 18.61 cycles/op
lockadd_unalign: 411.84 cycles/op
interference type: other_core_read_line
add: 1.78 cycles/op
add_mfence: 71.25 cycles/op
lockadd: 15.63 cycles/op
xadd: 15.39 cycles/op
swap: 18.04 cycles/op
cmpxchg: 65.87 cycles/op
lockadd_unalign: 408.33 cycles/op
interference type: other_core_write_line
add: 3.90 cycles/op
add_mfence: 121.43 cycles/op
lockadd: 16.01 cycles/op
xadd: 16.13 cycles/op
swap: 15.80 cycles/op
cmpxchg: 15.56 cycles/op
lockadd_unalign: 407.19 cycles/op
interference type: three_cores_read_line
add: 1.85 cycles/op
add_mfence: 71.31 cycles/op
lockadd: 69.29 cycles/op
xadd: 69.65 cycles/op
swap: 69.88 cycles/op
cmpxchg: 66.89 cycles/op
lockadd_unalign: 404.38 cycles/op
interference type: three_cores_write_line
add: 3.46 cycles/op
add_mfence: 37.87 cycles/op
lockadd: 35.72 cycles/op
xadd: 23.67 cycles/op
swap: 15.59 cycles/op
cmpxchg: 15.61 cycles/op
lockadd_unalign: 430.37 cycles/op
i5-2400 (3.10GHz Sandy Bridge)
interference type: none
add: 1.69 cycles/op
add_mfence: 44.08 cycles/op
lockadd: 25.19 cycles/op
xadd: 25.19 cycles/op
swap: 24.46 cycles/op
cmpxchg: 25.19 cycles/op
lockadd_unalign: 615.60 cycles/op
interference type: hyperthread_read_line
add: 2.23 cycles/op
add_mfence: 66.85 cycles/op
lockadd: 25.19 cycles/op
xadd: 25.19 cycles/op
swap: 24.50 cycles/op
cmpxchg: 87.92 cycles/op
lockadd_unalign: 613.58 cycles/op
interference type: hyperthread_write_line
add: 1.69 cycles/op
add_mfence: 44.73 cycles/op
lockadd: 25.19 cycles/op
xadd: 25.19 cycles/op
swap: 24.46 cycles/op
cmpxchg: 25.19 cycles/op
lockadd_unalign: 614.62 cycles/op
interference type: other_core_read_line
add: 2.22 cycles/op
add_mfence: 44.56 cycles/op
lockadd: 25.19 cycles/op
xadd: 25.22 cycles/op
swap: 24.46 cycles/op
cmpxchg: 25.23 cycles/op
lockadd_unalign: 614.30 cycles/op
interference type: other_core_write_line
add: 1.69 cycles/op
add_mfence: 44.63 cycles/op
lockadd: 159.29 cycles/op
xadd: 25.19 cycles/op
swap: 24.46 cycles/op
cmpxchg: 25.19 cycles/op
lockadd_unalign: 614.38 cycles/op
interference type: three_cores_read_line
add: 2.23 cycles/op
add_mfence: 113.23 cycles/op
lockadd: 108.38 cycles/op
xadd: 108.40 cycles/op
swap: 108.28 cycles/op
cmpxchg: 25.19 cycles/op
lockadd_unalign: 612.58 cycles/op
interference type: three_cores_write_line
add: 3.56 cycles/op
add_mfence: 44.53 cycles/op
lockadd: 25.90 cycles/op
xadd: 160.24 cycles/op
swap: 142.26 cycles/op
cmpxchg: 25.23 cycles/op
lockadd_unalign: 616.72 cycles/op
AMD FX 8350 (8core)
interference type: none
add: 2.00 cycles/op
add_mfence: 94.95 cycles/op
lockadd: 43.18 cycles/op
xadd: 42.00 cycles/op
swap: 42.18 cycles/op
cmpxchg: 45.94 cycles/op
lockadd_unalign: 216.35 cycles/op
interference type: hyperthread_read_li
add: 2.00 cycles/op
add_mfence: 98.13 cycles/op
lockadd: 95.56 cycles/op
xadd: 93.84 cycles/op
swap: 93.82 cycles/op
cmpxchg: 94.22 cycles/op
lockadd_unalign: 274.09 cycles/op
interference type: hyperthread_write_l
add: 25.08 cycles/op
add_mfence: 177.02 cycles/op
lockadd: 142.13 cycles/op
xadd: 149.08 cycles/op
swap: 150.16 cycles/op
cmpxchg: 3697.73 cycles/op
lockadd_unalign: 259.23 cycles/op
interference type: other_core_read_lin
add: 6.42 cycles/op
add_mfence: 393.58 cycles/op
lockadd: 216.35 cycles/op
xadd: 391.48 cycles/op
swap: 42.05 cycles/op
cmpxchg: 45.93 cycles/op
lockadd_unalign: 213.56 cycles/op
interference type: other_core_write_li
add: 2.00 cycles/op
add_mfence: 473.65 cycles/op
lockadd: 387.04 cycles/op
xadd: 378.94 cycles/op
swap: 396.20 cycles/op
cmpxchg: 942.90 cycles/op
lockadd_unalign: 655.72 cycles/op
interference type: three_cores_read_li
add: 13.53 cycles/op
add_mfence: 835.01 cycles/op
lockadd: 443.42 cycles/op
xadd: 580.89 cycles/op
swap: 834.23 cycles/op
cmpxchg: 1048.45 cycles/op
lockadd_unalign: 968.52 cycles/op
interference type: three_cores_write_l
add: 82.73 cycles/op
add_mfence: 905.74 cycles/op
lockadd: 825.68 cycles/op
xadd: 844.77 cycles/op
swap: 827.17 cycles/op
cmpxchg: 2491.23 cycles/op
lockadd_unalign: 1140.24 cycles/op
Atom D510 (1.66GHz, 2 core, 4 thread) in Linux (port here: https://github.com/maxburke/atomic_ops_test):
interference type: none
add: 2.88 cycles/op
add_mfence: 5.75 cycles/op
lockadd: 2.88 cycles/op
xadd: 7.83 cycles/op
swap: 7.67 cycles/op
cmpxchg: 21.50 cycles/op
lockadd_unalign: 181.44 cycles/op
interference type: hyperthread_read_line
add: 2.88 cycles/op
add_mfence: 5.75 cycles/op
lockadd: 2.88 cycles/op
xadd: 7.83 cycles/op
swap: 7.67 cycles/op
cmpxchg: 21.50 cycles/op
lockadd_unalign: 181.41 cycles/op
interference type: hyperthread_write_line
add: 5.00 cycles/op
add_mfence: 4.63 cycles/op
lockadd: 5.00 cycles/op
xadd: 7.75 cycles/op
swap: 7.25 cycles/op
cmpxchg: 24.02 cycles/op
lockadd_unalign: 180.20 cycles/op
interference type: other_core_read_line
add: 2.88 cycles/op
add_mfence: 5.75 cycles/op
lockadd: 2.88 cycles/op
xadd: 7.83 cycles/op
swap: 7.67 cycles/op
cmpxchg: 21.50 cycles/op
lockadd_unalign: 181.40 cycles/op
interference type: other_core_write_line
add: 36.12 cycles/op
add_mfence: 70.96 cycles/op
lockadd: 35.50 cycles/op
xadd: 96.41 cycles/op
swap: 68.43 cycles/op
cmpxchg: 348.55 cycles/op
lockadd_unalign: 209.81 cycles/op
interference type: three_cores_read_line
add: 2.88 cycles/op
add_mfence: 5.75 cycles/op
lockadd: 2.88 cycles/op
xadd: 7.83 cycles/op
swap: 7.67 cycles/op
cmpxchg: 21.50 cycles/op
lockadd_unalign: 185.16 cycles/op
interference type: three_cores_write_line
add: 36.19 cycles/op
add_mfence: 71.41 cycles/op
lockadd: 36.28 cycles/op
xadd: 96.52 cycles/op
swap: 68.35 cycles/op
cmpxchg: 344.29 cycles/op
lockadd_unalign: 209.89 cycles/op
*Port uses rdtsc instead of rdtscp, but the D510 doesn't appear to have rdtscp. I don't think it should have too much of an effect as the processor is in-order.
i7-3930K CPU @ 3.20Ghz
interference type: none add: 1.91 cycles/op add_mfence: 45.98 cycles/op lockadd: 20.51 cycles/op xadd: 20.71 cycles/op swap: 20.71 cycles/op cmpxchg: 20.50 cycles/op lockadd_unalign: 1469.86 cycles/op interference type: hyperthread_read_line add: 1.94 cycles/op add_mfence: 45.98 cycles/op lockadd: 29.69 cycles/op xadd: 29.22 cycles/op swap: 61.40 cycles/op cmpxchg: 39.95 cycles/op lockadd_unalign: 1420.95 cycles/op interference type: hyperthread_write_line add: 6.07 cycles/op add_mfence: 50.24 cycles/op lockadd: 21.62 cycles/op xadd: 22.27 cycles/op swap: 50.25 cycles/op cmpxchg: 111.57 cycles/op lockadd_unalign: 1489.86 cycles/op interference type: other_core_read_line add: 2.34 cycles/op add_mfence: 146.13 cycles/op lockadd: 140.78 cycles/op xadd: 142.04 cycles/op swap: 132.71 cycles/op cmpxchg: 144.98 cycles/op lockadd_unalign: 1501.67 cycles/op interference type: other_core_write_line add: 4.65 cycles/op add_mfence: 206.75 cycles/op lockadd: 160.12 cycles/op xadd: 162.33 cycles/op swap: 145.06 cycles/op cmpxchg: 349.43 cycles/op lockadd_unalign: 1506.68 cycles/op interference type: three_cores_read_line add: 2.44 cycles/op add_mfence: 161.34 cycles/op lockadd: 162.22 cycles/op xadd: 162.29 cycles/op swap: 151.51 cycles/op cmpxchg: 163.30 cycles/op lockadd_unalign: 1514.29 cycles/op interference type: three_cores_write_line add: 10.39 cycles/op add_mfence: 423.03 cycles/op lockadd: 382.45 cycles/op xadd: 389.20 cycles/op swap: 325.26 cycles/op cmpxchg: 2575.06 cycles/op lockadd_unalign: 1698.24 cycles/op
Intel Core 2 Quad Extreme QX9650 @3ghz
Yorkfield
8GB DDR2 (4 sticks, dual channel, running at 333 Mhz 5-5-5-15)
interference type: none
add: 1.50 cycles/op
add_mfence: 14.00 cycles/op
lockadd: 20.13 cycles/op
xadd: 20.01 cycles/op
swap: 18.38 cycles/op
cmpxchg: 35.54 cycles/op
lockadd_unalign: 280.44 cycles/op
interference type: hyperthread_read_line
add: 1.50 cycles/op
add_mfence: 14.01 cycles/op
lockadd: 20.17 cycles/op
xadd: 20.01 cycles/op
swap: 18.59 cycles/op
cmpxchg: 36.61 cycles/op
lockadd_unalign: 274.59 cycles/op
interference type: hyperthread_write_line
add: 9.67 cycles/op
add_mfence: 14.00 cycles/op
lockadd: 20.13 cycles/op
xadd: 20.02 cycles/op
swap: 18.38 cycles/op
cmpxchg: 42.39 cycles/op
lockadd_unalign: 316.51 cycles/op
interference type: other_core_read_line
add: 1.04 cycles/op
add_mfence: 14.00 cycles/op
lockadd: 20.13 cycles/op
xadd: 20.01 cycles/op
swap: 18.38 cycles/op
cmpxchg: 35.55 cycles/op
lockadd_unalign: 292.92 cycles/op
interference type: other_core_write_line
add: 1.62 cycles/op
add_mfence: 14.00 cycles/op
lockadd: 20.13 cycles/op
xadd: 20.01 cycles/op
swap: 18.38 cycles/op
cmpxchg: 35.55 cycles/op
lockadd_unalign: 342.45 cycles/op
interference type: three_cores_read_line
add: 4.48 cycles/op
add_mfence: 14.00 cycles/op
lockadd: 20.13 cycles/op
xadd: 20.05 cycles/op
swap: 18.38 cycles/op
cmpxchg: 35.55 cycles/op
lockadd_unalign: 519.35 cycles/op
interference type: three_cores_write_line
add: 1.62 cycles/op
add_mfence: 14.14 cycles/op
lockadd: 192.96 cycles/op
xadd: 165.69 cycles/op
swap: 211.02 cycles/op
cmpxchg: 691.73 cycles/op
lockadd_unalign: 435.48 cycles/op
Intel i5 760
http://pastie.org/9429871
AMD A10-4600M (2.30GHz) with 8GB RAM
interference type: none
add: 1.71 cycles/op
dependent_adds: 0.85 cycles/op
add_mfence: 81.16 cycles/op
lockadd: 36.80 cycles/op
xadd: 35.96 cycles/op
swap: 35.95 cycles/op
cmpxchg: 39.15 cycles/op
lockadd_unalign: 173.03 cycles/op
interference type: hyperthread_read_line
add: 1.73 cycles/op
dependent_adds: 0.85 cycles/op
add_mfence: 84.60 cycles/op
lockadd: 80.40 cycles/op
xadd: 61.32 cycles/op
swap: 80.49 cycles/op
cmpxchg: 81.13 cycles/op
lockadd_unalign: 212.81 cycles/op
interference type: hyperthread_write_line
add: 20.24 cycles/op
dependent_adds: 2.70 cycles/op
add_mfence: 150.56 cycles/op
lockadd: 120.42 cycles/op
xadd: 126.89 cycles/op
swap: 48.89 cycles/op
cmpxchg: 46.72 cycles/op
lockadd_unalign: 254.07 cycles/op
interference type: other_core_read_line
add: 5.24 cycles/op
dependent_adds: 0.85 cycles/op
add_mfence: 302.17 cycles/op
lockadd: 48.86 cycles/op
xadd: 48.45 cycles/op
swap: 45.28 cycles/op
cmpxchg: 52.77 cycles/op
lockadd_unalign: 385.89 cycles/op
interference type: other_core_write_line
add: 2.46 cycles/op
dependent_adds: 5.90 cycles/op
add_mfence: 321.18 cycles/op
lockadd: 37.93 cycles/op
xadd: 48.45 cycles/op
swap: 47.70 cycles/op
cmpxchg: 43.81 cycles/op
lockadd_unalign: 272.36 cycles/op
interference type: three_cores_read_line
add: 5.26 cycles/op
dependent_adds: 0.85 cycles/op
add_mfence: 323.13 cycles/op
lockadd: 270.49 cycles/op
xadd: 50.26 cycles/op
swap: 305.33 cycles/op
cmpxchg: 52.17 cycles/op
lockadd_unalign: 443.51 cycles/op
interference type: three_cores_write_line
add: 32.33 cycles/op
dependent_adds: 0.85 cycles/op
add_mfence: 251.92 cycles/op
lockadd: 133.34 cycles/op
xadd: 50.48 cycles/op
swap: 400.78 cycles/op
cmpxchg: 52.03 cycles/op
lockadd_unalign: 597.53 cycles/op
AMD Opteron 6272 (16 module @ 2.1GHz)
interference type: none
add: 2.19 cycles/op
add_mfence: 95.01 cycles/op
lockadd: 56.15 cycles/op
xadd: 53.15 cycles/op
swap: 53.14 cycles/op
cmpxchg: 54.68 cycles/op
lockadd_unalign: 247.77 cycles/op
interference type: hyperthread_read_line
add: 2.50 cycles/op
add_mfence: 100.42 cycles/op
lockadd: 138.72 cycles/op
xadd: 97.43 cycles/op
swap: 108.15 cycles/op
cmpxchg: 143.43 cycles/op
lockadd_unalign: 293.73 cycles/op
interference type: hyperthread_write_line
add: 36.76 cycles/op
add_mfence: 171.70 cycles/op
lockadd: 226.86 cycles/op
xadd: 222.19 cycles/op
swap: 226.44 cycles/op
cmpxchg: 4643.72 cycles/op
lockadd_unalign: 348.22 cycles/op
interference type: other_core_read_line
add: 86.59 cycles/op
add_mfence: 548.38 cycles/op
lockadd: 453.45 cycles/op
xadd: 435.06 cycles/op
swap: 453.95 cycles/op
cmpxchg: 502.23 cycles/op
lockadd_unalign: 758.78 cycles/op
interference type: other_core_write_line
add: 75.59 cycles/op
add_mfence: 421.52 cycles/op
lockadd: 445.01 cycles/op
xadd: 416.42 cycles/op
swap: 405.71 cycles/op
cmpxchg: 54.79 cycles/op
lockadd_unalign: 687.30 cycles/op
interference type: three_cores_read_line
add: 186.46 cycles/op
add_mfence: 921.73 cycles/op
lockadd: 767.26 cycles/op
xadd: 752.58 cycles/op
swap: 769.43 cycles/op
cmpxchg: 885.06 cycles/op
lockadd_unalign: 1240.31 cycles/op
interference type: three_cores_write_line
add: 214.71 cycles/op
add_mfence: 1095.98 cycles/op
lockadd: 875.85 cycles/op
xadd: 865.22 cycles/op
swap: 887.61 cycles/op
cmpxchg: 8397.39 cycles/op
lockadd_unalign: 1399.55 cycles/op
Intel Core i7-3770K (Ivy Bridge) @ 3.5GHz