Skip to content

Instantly share code, notes, and snippets.

@rygorous
Last active August 29, 2015 14:04
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save rygorous/153ea6493ce2efabd41c to your computer and use it in GitHub Desktop.
Save rygorous/153ea6493ce2efabd41c to your computer and use it in GitHub Desktop.
Memory operations test
Code (and Windows binary) here: https://github.com/rygorous/atomic_ops_test
I'd appreciate it if some people could run this and post their results here in a
comment, with a short description of what CPU they're using.
UPDATE: I have recent Intel CPUs (released within the last 3 years) pretty well covered
by now, so if that's what you have in your machine, don't bother running the test. But
I'd love to get some more data points for older Intel CPUs and AMD parts!
Results so far:
----
AMD Phenom II 925 (AMD 10h) @ 2.8GHz
interference type: none
add: 1.88 cycles/op
add_mfence: 36.80 cycles/op
lockadd: 18.01 cycles/op
xadd: 17.01 cycles/op
swap: 16.01 cycles/op
cmpxchg: 18.02 cycles/op
lockadd_unalign: 137.48 cycles/op
interference type: hyperthread_read_line
add: 13.87 cycles/op
add_mfence: 122.74 cycles/op
lockadd: 131.68 cycles/op
xadd: 17.01 cycles/op
swap: 16.01 cycles/op
cmpxchg: 18.02 cycles/op
lockadd_unalign: 345.44 cycles/op
interference type: hyperthread_write_line
add: 14.12 cycles/op
add_mfence: 132.87 cycles/op
lockadd: 114.37 cycles/op
xadd: 131.32 cycles/op
swap: 16.06 cycles/op
cmpxchg: 18.26 cycles/op
lockadd_unalign: 344.94 cycles/op
interference type: other_core_read_line
add: 1.88 cycles/op
add_mfence: 36.85 cycles/op
lockadd: 18.01 cycles/op
xadd: 17.01 cycles/op
swap: 16.01 cycles/op
cmpxchg: 18.01 cycles/op
lockadd_unalign: 136.15 cycles/op
interference type: other_core_write_line
add: 1.88 cycles/op
add_mfence: 37.07 cycles/op
lockadd: 18.01 cycles/op
xadd: 17.01 cycles/op
swap: 16.01 cycles/op
cmpxchg: 18.05 cycles/op
lockadd_unalign: 135.07 cycles/op
interference type: three_cores_read_line
add: 13.84 cycles/op
add_mfence: 119.52 cycles/op
lockadd: 130.07 cycles/op
xadd: 161.28 cycles/op
swap: 135.72 cycles/op
cmpxchg: 143.83 cycles/op
lockadd_unalign: 342.39 cycles/op
interference type: three_cores_write_line
add: 13.58 cycles/op
add_mfence: 130.97 cycles/op
lockadd: 18.05 cycles/op
xadd: 17.01 cycles/op
swap: 16.01 cycles/op
cmpxchg: 18.26 cycles/op
lockadd_unalign: 135.08 cycles/op
----
AMD FX 8350 (Piledriver) @ 4.0GHz
interference type: none
add: 2.00 cycles/op
add_mfence: 94.95 cycles/op
lockadd: 43.18 cycles/op
xadd: 42.00 cycles/op
swap: 42.18 cycles/op
cmpxchg: 45.94 cycles/op
lockadd_unalign: 216.35 cycles/op
interference type: hyperthread_read_line
add: 2.00 cycles/op
add_mfence: 98.13 cycles/op
lockadd: 95.56 cycles/op
xadd: 93.84 cycles/op
swap: 93.82 cycles/op
cmpxchg: 94.22 cycles/op
lockadd_unalign: 274.09 cycles/op
interference type: hyperthread_write_line
add: 25.08 cycles/op
add_mfence: 177.02 cycles/op
lockadd: 142.13 cycles/op
xadd: 149.08 cycles/op
swap: 150.16 cycles/op
cmpxchg: 3697.73 cycles/op
lockadd_unalign: 259.23 cycles/op
interference type: other_core_read_line
add: 6.42 cycles/op
add_mfence: 393.58 cycles/op
lockadd: 216.35 cycles/op
xadd: 391.48 cycles/op
swap: 42.05 cycles/op
cmpxchg: 45.93 cycles/op
lockadd_unalign: 213.56 cycles/op
interference type: other_core_write_line
add: 2.00 cycles/op
add_mfence: 473.65 cycles/op
lockadd: 387.04 cycles/op
xadd: 378.94 cycles/op
swap: 396.20 cycles/op
cmpxchg: 942.90 cycles/op
lockadd_unalign: 655.72 cycles/op
interference type: three_cores_read_line
add: 13.53 cycles/op
add_mfence: 835.01 cycles/op
lockadd: 443.42 cycles/op
xadd: 580.89 cycles/op
swap: 834.23 cycles/op
cmpxchg: 1048.45 cycles/op
lockadd_unalign: 968.52 cycles/op
interference type: three_cores_write_line
add: 82.73 cycles/op
add_mfence: 905.74 cycles/op
lockadd: 825.68 cycles/op
xadd: 844.77 cycles/op
swap: 827.17 cycles/op
cmpxchg: 2491.23 cycles/op
lockadd_unalign: 1140.24 cycles/op
----
Intel Atom D510 (1.66GHz, 2 core, 4 thread) on Linux (port here: https://github.com/maxburke/atomic_ops_test):
interference type: none
add: 2.88 cycles/op
add_mfence: 5.75 cycles/op
lockadd: 2.88 cycles/op
xadd: 7.83 cycles/op
swap: 7.67 cycles/op
cmpxchg: 21.50 cycles/op
lockadd_unalign: 181.44 cycles/op
interference type: hyperthread_read_line
add: 2.88 cycles/op
add_mfence: 5.75 cycles/op
lockadd: 2.88 cycles/op
xadd: 7.83 cycles/op
swap: 7.67 cycles/op
cmpxchg: 21.50 cycles/op
lockadd_unalign: 181.41 cycles/op
interference type: hyperthread_write_line
add: 5.00 cycles/op
add_mfence: 4.63 cycles/op
lockadd: 5.00 cycles/op
xadd: 7.75 cycles/op
swap: 7.25 cycles/op
cmpxchg: 24.02 cycles/op
lockadd_unalign: 180.20 cycles/op
interference type: other_core_read_line
add: 2.88 cycles/op
add_mfence: 5.75 cycles/op
lockadd: 2.88 cycles/op
xadd: 7.83 cycles/op
swap: 7.67 cycles/op
cmpxchg: 21.50 cycles/op
lockadd_unalign: 181.40 cycles/op
interference type: other_core_write_line
add: 36.12 cycles/op
add_mfence: 70.96 cycles/op
lockadd: 35.50 cycles/op
xadd: 96.41 cycles/op
swap: 68.43 cycles/op
cmpxchg: 348.55 cycles/op
lockadd_unalign: 209.81 cycles/op
interference type: three_cores_read_line
add: 2.88 cycles/op
add_mfence: 5.75 cycles/op
lockadd: 2.88 cycles/op
xadd: 7.83 cycles/op
swap: 7.67 cycles/op
cmpxchg: 21.50 cycles/op
lockadd_unalign: 185.16 cycles/op
interference type: three_cores_write_line
add: 36.19 cycles/op
add_mfence: 71.41 cycles/op
lockadd: 36.28 cycles/op
xadd: 96.52 cycles/op
swap: 68.35 cycles/op
cmpxchg: 344.29 cycles/op
lockadd_unalign: 209.89 cycles/op
----
Intel Core i7-920 (Bloomfield [NHM derived])
interference type: none
add: 2.74 cycles/op
add_mfence: 43.10 cycles/op
lockadd: 20.63 cycles/op
xadd: 20.70 cycles/op
swap: 20.72 cycles/op
cmpxchg: 17.74 cycles/op
lockadd_unalign: 1218.97 cycles/op
interference type: hyperthread_read_line
add: 2.05 cycles/op
add_mfence: 43.85 cycles/op
lockadd: 21.61 cycles/op
xadd: 21.43 cycles/op
swap: 20.04 cycles/op
cmpxchg: 20.89 cycles/op
lockadd_unalign: 1216.35 cycles/op
interference type: hyperthread_write_line
add: 4.58 cycles/op
add_mfence: 36.71 cycles/op
lockadd: 22.11 cycles/op
xadd: 21.99 cycles/op
swap: 22.36 cycles/op
cmpxchg: 44.44 cycles/op
lockadd_unalign: 1236.09 cycles/op
interference type: other_core_read_line
add: 3.65 cycles/op
add_mfence: 156.83 cycles/op
lockadd: 85.56 cycles/op
xadd: 83.32 cycles/op
swap: 83.72 cycles/op
cmpxchg: 141.47 cycles/op
lockadd_unalign: 1216.30 cycles/op
interference type: other_core_write_line
add: 4.66 cycles/op
add_mfence: 117.53 cycles/op
lockadd: 63.93 cycles/op
xadd: 61.09 cycles/op
swap: 61.17 cycles/op
cmpxchg: 145.92 cycles/op
lockadd_unalign: 1224.23 cycles/op
interference type: three_cores_read_line
add: 3.71 cycles/op
add_mfence: 227.06 cycles/op
lockadd: 138.89 cycles/op
xadd: 130.12 cycles/op
swap: 133.46 cycles/op
cmpxchg: 214.04 cycles/op
lockadd_unalign: 1234.76 cycles/op
interference type: three_cores_write_line
add: 5.51 cycles/op
add_mfence: 211.07 cycles/op
lockadd: 121.84 cycles/op
xadd: 120.34 cycles/op
swap: 119.18 cycles/op
cmpxchg: 227.27 cycles/op
lockadd_unalign: 1265.16 cycles/op
----
Intel Core i3-2310M (Sandy Bridge) @ 2.1GHz
interference type: none
add: 2.08 cycles/op
add_mfence: 51.03 cycles/op
lockadd: 22.48 cycles/op
xadd: 22.58 cycles/op
swap: 22.84 cycles/op
cmpxchg: 23.22 cycles/op
lockadd_unalign: 571.91 cycles/op
interference type: hyperthread_read_line
add: 2.12 cycles/op
add_mfence: 51.49 cycles/op
lockadd: 24.57 cycles/op
xadd: 36.03 cycles/op
swap: 29.83 cycles/op
cmpxchg: 30.42 cycles/op
lockadd_unalign: 597.80 cycles/op
interference type: hyperthread_write_line
add: 6.83 cycles/op
add_mfence: 55.87 cycles/op
lockadd: 31.54 cycles/op
xadd: 34.54 cycles/op
swap: 35.70 cycles/op
cmpxchg: 114.43 cycles/op
lockadd_unalign: 585.72 cycles/op
interference type: other_core_read_line
add: 2.09 cycles/op
add_mfence: 56.78 cycles/op
lockadd: 110.71 cycles/op
xadd: 107.70 cycles/op
swap: 26.68 cycles/op
cmpxchg: 115.62 cycles/op
lockadd_unalign: 570.43 cycles/op
interference type: other_core_write_line
add: 4.52 cycles/op
add_mfence: 51.98 cycles/op
lockadd: 24.85 cycles/op
xadd: 22.84 cycles/op
swap: 22.94 cycles/op
cmpxchg: 23.27 cycles/op
lockadd_unalign: 597.92 cycles/op
interference type: three_cores_read_line
add: 2.65 cycles/op
add_mfence: 107.83 cycles/op
lockadd: 99.94 cycles/op
xadd: 108.54 cycles/op
swap: 98.83 cycles/op
cmpxchg: 114.97 cycles/op
lockadd_unalign: 589.32 cycles/op
interference type: three_cores_write_line
add: 4.52 cycles/op
add_mfence: 178.42 cycles/op
lockadd: 52.71 cycles/op
xadd: 151.44 cycles/op
swap: 133.44 cycles/op
cmpxchg: 23.26 cycles/op
lockadd_unalign: 577.85 cycles/op
----
Intel Core i5-2400 (Sandy Bridge) @ 3.10GHz
interference type: none
add: 1.69 cycles/op
add_mfence: 44.08 cycles/op
lockadd: 25.19 cycles/op
xadd: 25.19 cycles/op
swap: 24.46 cycles/op
cmpxchg: 25.19 cycles/op
lockadd_unalign: 615.60 cycles/op
interference type: hyperthread_read_line
add: 2.23 cycles/op
add_mfence: 66.85 cycles/op
lockadd: 25.19 cycles/op
xadd: 25.19 cycles/op
swap: 24.50 cycles/op
cmpxchg: 87.92 cycles/op
lockadd_unalign: 613.58 cycles/op
interference type: hyperthread_write_line
add: 1.69 cycles/op
add_mfence: 44.73 cycles/op
lockadd: 25.19 cycles/op
xadd: 25.19 cycles/op
swap: 24.46 cycles/op
cmpxchg: 25.19 cycles/op
lockadd_unalign: 614.62 cycles/op
interference type: other_core_read_line
add: 2.22 cycles/op
add_mfence: 44.56 cycles/op
lockadd: 25.19 cycles/op
xadd: 25.22 cycles/op
swap: 24.46 cycles/op
cmpxchg: 25.23 cycles/op
lockadd_unalign: 614.30 cycles/op
interference type: other_core_write_line
add: 1.69 cycles/op
add_mfence: 44.63 cycles/op
lockadd: 159.29 cycles/op
xadd: 25.19 cycles/op
swap: 24.46 cycles/op
cmpxchg: 25.19 cycles/op
lockadd_unalign: 614.38 cycles/op
interference type: three_cores_read_line
add: 2.23 cycles/op
add_mfence: 113.23 cycles/op
lockadd: 108.38 cycles/op
xadd: 108.40 cycles/op
swap: 108.28 cycles/op
cmpxchg: 25.19 cycles/op
lockadd_unalign: 612.58 cycles/op
interference type: three_cores_write_line
add: 3.56 cycles/op
add_mfence: 44.53 cycles/op
lockadd: 25.90 cycles/op
xadd: 160.24 cycles/op
swap: 142.26 cycles/op
cmpxchg: 25.23 cycles/op
lockadd_unalign: 616.72 cycles/op
----
Intel Core i7-2677M (Sandy Bridge) @ 1.80 GHz
interference type: none
add: 1.51 cycles/op
add_mfence: 35.09 cycles/op
lockadd: 15.48 cycles/op
xadd: 15.32 cycles/op
swap: 15.78 cycles/op
cmpxchg: 15.99 cycles/op
lockadd_unalign: 410.44 cycles/op
interference type: hyperthread_read_line
add: 1.49 cycles/op
add_mfence: 34.68 cycles/op
lockadd: 15.48 cycles/op
xadd: 21.78 cycles/op
swap: 24.98 cycles/op
cmpxchg: 15.56 cycles/op
lockadd_unalign: 450.96 cycles/op
interference type: hyperthread_write_line
add: 1.51 cycles/op
add_mfence: 35.05 cycles/op
lockadd: 15.50 cycles/op
xadd: 15.53 cycles/op
swap: 15.81 cycles/op
cmpxchg: 15.94 cycles/op
lockadd_unalign: 408.41 cycles/op
interference type: other_core_read_line
add: 1.51 cycles/op
add_mfence: 35.07 cycles/op
lockadd: 15.48 cycles/op
xadd: 15.33 cycles/op
swap: 15.78 cycles/op
cmpxchg: 15.98 cycles/op
lockadd_unalign: 408.03 cycles/op
interference type: other_core_write_line
add: 3.35 cycles/op
add_mfence: 118.20 cycles/op
lockadd: 100.99 cycles/op
xadd: 103.57 cycles/op
swap: 106.19 cycles/op
cmpxchg: 251.31 cycles/op
lockadd_unalign: 405.65 cycles/op
interference type: three_cores_read_line
add: 1.85 cycles/op
add_mfence: 70.64 cycles/op
lockadd: 73.13 cycles/op
xadd: 73.48 cycles/op
swap: 68.97 cycles/op
cmpxchg: 71.28 cycles/op
lockadd_unalign: 439.66 cycles/op
interference type: three_cores_write_line
add: 3.36 cycles/op
add_mfence: 121.96 cycles/op
lockadd: 90.12 cycles/op
xadd: 91.66 cycles/op
swap: 90.41 cycles/op
cmpxchg: 230.79 cycles/op
lockadd_unalign: 405.44 cycles/op
----
Intel Core i7-2600K (Sandy Bridge) @ 3.4GHz
interference type: none
add: 2.11 cycles/op
add_mfence: 50.19 cycles/op
lockadd: 22.32 cycles/op
xadd: 22.22 cycles/op
swap: 22.53 cycles/op
cmpxchg: 22.86 cycles/op
lockadd_unalign: 648.10 cycles/op
interference type: hyperthread_read_line
add: 2.12 cycles/op
add_mfence: 50.24 cycles/op
lockadd: 32.74 cycles/op
xadd: 39.23 cycles/op
swap: 29.51 cycles/op
cmpxchg: 29.36 cycles/op
lockadd_unalign: 682.59 cycles/op
interference type: hyperthread_write_line
add: 6.97 cycles/op
add_mfence: 54.63 cycles/op
lockadd: 53.94 cycles/op
xadd: 36.98 cycles/op
swap: 35.85 cycles/op
cmpxchg: 131.69 cycles/op
lockadd_unalign: 652.76 cycles/op
interference type: other_core_read_line
add: 2.62 cycles/op
add_mfence: 103.76 cycles/op
lockadd: 108.31 cycles/op
xadd: 108.12 cycles/op
swap: 101.97 cycles/op
cmpxchg: 113.30 cycles/op
lockadd_unalign: 648.32 cycles/op
interference type: other_core_write_line
add: 4.50 cycles/op
add_mfence: 171.69 cycles/op
lockadd: 139.92 cycles/op
xadd: 140.46 cycles/op
swap: 146.66 cycles/op
cmpxchg: 360.81 cycles/op
lockadd_unalign: 647.92 cycles/op
interference type: three_cores_read_line
add: 2.72 cycles/op
add_mfence: 123.66 cycles/op
lockadd: 134.83 cycles/op
xadd: 134.35 cycles/op
swap: 132.47 cycles/op
cmpxchg: 136.49 cycles/op
lockadd_unalign: 646.96 cycles/op
interference type: three_cores_write_line
add: 11.21 cycles/op
add_mfence: 412.56 cycles/op
lockadd: 331.98 cycles/op
xadd: 337.59 cycles/op
swap: 383.45 cycles/op
cmpxchg: 5916.89 cycles/op
lockadd_unalign: 733.50 cycles/op
----
Intel Core i5-3427U (Ivy Bridge) @ 1.8Ghz
interference type: none
add: 1.82 cycles/op
add_mfence: 45.40 cycles/op
lockadd: 19.11 cycles/op
xadd: 19.09 cycles/op
swap: 19.30 cycles/op
cmpxchg: 18.99 cycles/op
lockadd_unalign: 628.82 cycles/op
interference type: hyperthread_read_line
add: 1.85 cycles/op
add_mfence: 45.42 cycles/op
lockadd: 19.13 cycles/op
xadd: 19.07 cycles/op
swap: 19.41 cycles/op
cmpxchg: 19.01 cycles/op
lockadd_unalign: 621.71 cycles/op
interference type: hyperthread_write_line
add: 1.82 cycles/op
add_mfence: 45.63 cycles/op
lockadd: 32.25 cycles/op
xadd: 19.08 cycles/op
swap: 19.41 cycles/op
cmpxchg: 19.01 cycles/op
lockadd_unalign: 613.85 cycles/op
interference type: other_core_read_line
add: 2.28 cycles/op
add_mfence: 87.28 cycles/op
lockadd: 21.03 cycles/op
xadd: 19.07 cycles/op
swap: 19.34 cycles/op
cmpxchg: 19.02 cycles/op
lockadd_unalign: 621.71 cycles/op
interference type: other_core_write_line
add: 1.82 cycles/op
add_mfence: 45.41 cycles/op
lockadd: 19.10 cycles/op
xadd: 19.05 cycles/op
swap: 19.30 cycles/op
cmpxchg: 18.99 cycles/op
lockadd_unalign: 628.41 cycles/op
interference type: three_cores_read_line
add: 2.28 cycles/op
add_mfence: 97.28 cycles/op
lockadd: 96.07 cycles/op
xadd: 96.23 cycles/op
swap: 94.17 cycles/op
cmpxchg: 96.80 cycles/op
lockadd_unalign: 619.27 cycles/op
interference type: three_cores_write_line
add: 4.29 cycles/op
add_mfence: 154.34 cycles/op
lockadd: 19.27 cycles/op
xadd: 19.15 cycles/op
swap: 19.36 cycles/op
cmpxchg: 19.01 cycles/op
lockadd_unalign: 645.76 cycles/op
----
Intel Core i7-3770K (Ivy Bridge) @ 3.5GHz
interference type: none
add: 1.76 cycles/op
add_mfence: 44.08 cycles/op
lockadd: 18.46 cycles/op
xadd: 18.40 cycles/op
swap: 18.45 cycles/op
cmpxchg: 18.32 cycles/op
lockadd_unalign: 661.34 cycles/op
interference type: hyperthread_read_line
add: 1.78 cycles/op
add_mfence: 43.83 cycles/op
lockadd: 113.62 cycles/op
xadd: 121.86 cycles/op
swap: 47.23 cycles/op
cmpxchg: 48.57 cycles/op
lockadd_unalign: 612.54 cycles/op
interference type: hyperthread_write_line
add: 6.45 cycles/op
add_mfence: 48.10 cycles/op
lockadd: 32.62 cycles/op
xadd: 31.40 cycles/op
swap: 34.96 cycles/op
cmpxchg: 121.29 cycles/op
lockadd_unalign: 647.97 cycles/op
interference type: other_core_read_line
add: 2.19 cycles/op
add_mfence: 88.60 cycles/op
lockadd: 76.83 cycles/op
xadd: 77.17 cycles/op
swap: 74.09 cycles/op
cmpxchg: 79.80 cycles/op
lockadd_unalign: 659.39 cycles/op
interference type: other_core_write_line
add: 3.85 cycles/op
add_mfence: 146.39 cycles/op
lockadd: 128.53 cycles/op
xadd: 126.04 cycles/op
swap: 117.97 cycles/op
cmpxchg: 294.79 cycles/op
lockadd_unalign: 691.28 cycles/op
interference type: three_cores_read_line
add: 2.27 cycles/op
add_mfence: 106.34 cycles/op
lockadd: 109.30 cycles/op
xadd: 109.28 cycles/op
swap: 112.08 cycles/op
cmpxchg: 109.78 cycles/op
lockadd_unalign: 674.16 cycles/op
interference type: three_cores_write_line
add: 13.57 cycles/op
add_mfence: 329.85 cycles/op
lockadd: 248.26 cycles/op
xadd: 270.56 cycles/op
swap: 267.62 cycles/op
cmpxchg: 3827.60 cycles/op
lockadd_unalign: 780.82 cycles/op
----
Intel Core i5-4460 (Haswell) 3.20GHz
interference type: none
add: 1.41 cycles/op
add_mfence: 43.45 cycles/op
lockadd: 22.59 cycles/op
xadd: 21.18 cycles/op
swap: 21.88 cycles/op
cmpxchg: 21.18 cycles/op
lockadd_unalign: 586.52 cycles/op
interference type: hyperthread_read_line
add: 1.41 cycles/op
add_mfence: 43.82 cycles/op
lockadd: 22.63 cycles/op
xadd: 21.56 cycles/op
swap: 105.71 cycles/op
cmpxchg: 21.56 cycles/op
lockadd_unalign: 583.86 cycles/op
interference type: hyperthread_write_line
add: 2.62 cycles/op
add_mfence: 43.70 cycles/op
lockadd: 88.30 cycles/op
xadd: 21.18 cycles/op
swap: 21.88 cycles/op
cmpxchg: 21.18 cycles/op
lockadd_unalign: 585.65 cycles/op
interference type: other_core_read_line
add: 1.42 cycles/op
add_mfence: 44.93 cycles/op
lockadd: 94.54 cycles/op
xadd: 106.89 cycles/op
swap: 22.39 cycles/op
cmpxchg: 22.20 cycles/op
lockadd_unalign: 582.73 cycles/op
interference type: other_core_write_line
add: 2.40 cycles/op
add_mfence: 47.28 cycles/op
lockadd: 117.44 cycles/op
xadd: 21.23 cycles/op
swap: 22.07 cycles/op
cmpxchg: 21.18 cycles/op
lockadd_unalign: 585.65 cycles/op
interference type: three_cores_read_line
add: 1.90 cycles/op
add_mfence: 119.15 cycles/op
lockadd: 104.99 cycles/op
xadd: 105.29 cycles/op
swap: 115.61 cycles/op
cmpxchg: 107.39 cycles/op
lockadd_unalign: 582.31 cycles/op
interference type: three_cores_write_line
add: 2.59 cycles/op
add_mfence: 47.06 cycles/op
lockadd: 22.64 cycles/op
xadd: 118.95 cycles/op
swap: 118.90 cycles/op
cmpxchg: 21.57 cycles/op
lockadd_unalign: 584.93 cycles/op
----
Intel Core i5-4670K (Haswell), stock clocks [i.e. 3.4GHz], turbos to 4GHz on all 4 cores during run:
interference type: none
add: 1.28 cycles/op
add_mfence: 39.19 cycles/op
lockadd: 20.43 cycles/op
xadd: 19.13 cycles/op
swap: 19.76 cycles/op
cmpxchg: 19.13 cycles/op
lockadd_unalign: 574.70 cycles/op
interference type: hyperthread_read_line
add: 1.64 cycles/op
add_mfence: 102.07 cycles/op
lockadd: 69.59 cycles/op
xadd: 19.17 cycles/op
swap: 19.80 cycles/op
cmpxchg: 19.27 cycles/op
lockadd_unalign: 574.58 cycles/op
interference type: hyperthread_write_line
add: 1.28 cycles/op
add_mfence: 39.19 cycles/op
lockadd: 20.43 cycles/op
xadd: 19.13 cycles/op
swap: 19.76 cycles/op
cmpxchg: 19.13 cycles/op
lockadd_unalign: 575.42 cycles/op
interference type: other_core_read_line
add: 1.28 cycles/op
add_mfence: 39.29 cycles/op
lockadd: 20.44 cycles/op
xadd: 19.16 cycles/op
swap: 19.80 cycles/op
cmpxchg: 19.14 cycles/op
lockadd_unalign: 575.71 cycles/op
interference type: other_core_write_line
add: 1.28 cycles/op
add_mfence: 39.19 cycles/op
lockadd: 20.40 cycles/op
xadd: 19.16 cycles/op
swap: 19.76 cycles/op
cmpxchg: 19.13 cycles/op
lockadd_unalign: 578.42 cycles/op
interference type: three_cores_read_line
add: 1.28 cycles/op
add_mfence: 39.20 cycles/op
lockadd: 20.44 cycles/op
xadd: 19.16 cycles/op
swap: 19.80 cycles/op
cmpxchg: 19.16 cycles/op
lockadd_unalign: 573.75 cycles/op
interference type: three_cores_write_line
add: 2.66 cycles/op
add_mfence: 130.16 cycles/op
lockadd: 97.48 cycles/op
xadd: 118.87 cycles/op
swap: 66.55 cycles/op
cmpxchg: 270.15 cycles/op
lockadd_unalign: 575.15 cycles/op
----
Intel Core i7-4770 (Haswell) CPU @ 3.40GHz
interference type: none
add: 1.77 cycles/op
add_mfence: 46.09 cycles/op
lockadd: 19.41 cycles/op
xadd: 20.31 cycles/op
swap: 22.08 cycles/op
cmpxchg: 19.67 cycles/op
lockadd_unalign: 704.17 cycles/op
interference type: hyperthread_read_line
add: 1.78 cycles/op
add_mfence: 46.32 cycles/op
lockadd: 18.06 cycles/op
xadd: 18.07 cycles/op
swap: 21.56 cycles/op
cmpxchg: 17.87 cycles/op
lockadd_unalign: 655.86 cycles/op
interference type: hyperthread_write_line
add: 10.06 cycles/op
add_mfence: 50.14 cycles/op
lockadd: 47.36 cycles/op
xadd: 39.90 cycles/op
swap: 30.60 cycles/op
cmpxchg: 111.52 cycles/op
lockadd_unalign: 817.70 cycles/op
interference type: other_core_read_line
add: 2.07 cycles/op
add_mfence: 103.08 cycles/op
lockadd: 113.44 cycles/op
xadd: 114.60 cycles/op
swap: 107.72 cycles/op
cmpxchg: 106.50 cycles/op
lockadd_unalign: 715.33 cycles/op
interference type: other_core_write_line
add: 3.12 cycles/op
add_mfence: 174.72 cycles/op
lockadd: 119.09 cycles/op
xadd: 119.09 cycles/op
swap: 116.53 cycles/op
cmpxchg: 376.28 cycles/op
lockadd_unalign: 742.58 cycles/op
interference type: three_cores_read_line
add: 2.08 cycles/op
add_mfence: 121.34 cycles/op
lockadd: 163.34 cycles/op
xadd: 162.96 cycles/op
swap: 161.23 cycles/op
cmpxchg: 151.60 cycles/op
lockadd_unalign: 741.54 cycles/op
interference type: three_cores_write_line
add: 7.59 cycles/op
add_mfence: 314.44 cycles/op
lockadd: 238.19 cycles/op
xadd: 238.33 cycles/op
swap: 237.93 cycles/op
cmpxchg: 2635.58 cycles/op
lockadd_unalign: 833.31 cycles/op
@BrainCruser
Copy link

http://pastebin.com/rZAbyUaj

i3-2310M @ 2.1GHz

@baldurk
Copy link

baldurk commented Jul 29, 2014

Intel Core i7-3770K (Ivy Bridge) @ 3.5GHz

interference type: none
             add:     1.76 cycles/op
      add_mfence:    44.08 cycles/op
         lockadd:    18.46 cycles/op
            xadd:    18.40 cycles/op
            swap:    18.45 cycles/op
         cmpxchg:    18.32 cycles/op
 lockadd_unalign:   661.34 cycles/op
interference type: hyperthread_running
             add:     1.75 cycles/op
      add_mfence:    44.00 cycles/op
         lockadd:    18.51 cycles/op
            xadd:    18.39 cycles/op
            swap:    18.48 cycles/op
         cmpxchg:    18.32 cycles/op
 lockadd_unalign:   657.71 cycles/op
interference type: hyperthread_read_line
             add:     1.78 cycles/op
      add_mfence:    43.83 cycles/op
         lockadd:   113.62 cycles/op
            xadd:   121.86 cycles/op
            swap:    47.23 cycles/op
         cmpxchg:    48.57 cycles/op
 lockadd_unalign:   612.54 cycles/op
interference type: hyperthread_write_line
             add:     6.45 cycles/op
      add_mfence:    48.10 cycles/op
         lockadd:    32.62 cycles/op
            xadd:    31.40 cycles/op
            swap:    34.96 cycles/op
         cmpxchg:   121.29 cycles/op
 lockadd_unalign:   647.97 cycles/op
interference type: other_core_read_line
             add:     2.19 cycles/op
      add_mfence:    88.60 cycles/op
         lockadd:    76.83 cycles/op
            xadd:    77.17 cycles/op
            swap:    74.09 cycles/op
         cmpxchg:    79.80 cycles/op
 lockadd_unalign:   659.39 cycles/op
interference type: other_core_write_line
             add:     3.85 cycles/op
      add_mfence:   146.39 cycles/op
         lockadd:   128.53 cycles/op
            xadd:   126.04 cycles/op
            swap:   117.97 cycles/op
         cmpxchg:   294.79 cycles/op
 lockadd_unalign:   691.28 cycles/op
interference type: three_cores_read_line
             add:     2.27 cycles/op
      add_mfence:   106.34 cycles/op
         lockadd:   109.30 cycles/op
            xadd:   109.28 cycles/op
            swap:   112.08 cycles/op
         cmpxchg:   109.78 cycles/op
 lockadd_unalign:   674.16 cycles/op
interference type: three_cores_write_line
             add:    13.57 cycles/op
      add_mfence:   329.85 cycles/op
         lockadd:   248.26 cycles/op
            xadd:   270.56 cycles/op
            swap:   267.62 cycles/op
         cmpxchg:  3827.60 cycles/op
 lockadd_unalign:   780.82 cycles/op

@fenbf
Copy link

fenbf commented Jul 29, 2014

Intel Core i5-3427U (Ivy Bridge) @ 1.8Ghz 

interference type: none
             add:     1.82 cycles/op
      add_mfence:    45.40 cycles/op
         lockadd:    19.11 cycles/op
            xadd:    19.09 cycles/op
            swap:    19.30 cycles/op
         cmpxchg:    18.99 cycles/op
 lockadd_unalign:   628.82 cycles/op
interference type: hyperthread_read_line
             add:     1.85 cycles/op
      add_mfence:    45.42 cycles/op
         lockadd:    19.13 cycles/op
            xadd:    19.07 cycles/op
            swap:    19.41 cycles/op
         cmpxchg:    19.01 cycles/op
 lockadd_unalign:   621.71 cycles/op
interference type: hyperthread_write_line
             add:     1.82 cycles/op
      add_mfence:    45.63 cycles/op
         lockadd:    32.25 cycles/op
            xadd:    19.08 cycles/op
            swap:    19.41 cycles/op
         cmpxchg:    19.01 cycles/op
 lockadd_unalign:   613.85 cycles/op
interference type: other_core_read_line
             add:     2.28 cycles/op
      add_mfence:    87.28 cycles/op
         lockadd:    21.03 cycles/op
            xadd:    19.07 cycles/op
            swap:    19.34 cycles/op
         cmpxchg:    19.02 cycles/op
 lockadd_unalign:   621.71 cycles/op
interference type: other_core_write_line
             add:     1.82 cycles/op
      add_mfence:    45.41 cycles/op
         lockadd:    19.10 cycles/op
            xadd:    19.05 cycles/op
            swap:    19.30 cycles/op
         cmpxchg:    18.99 cycles/op
 lockadd_unalign:   628.41 cycles/op
interference type: three_cores_read_line
             add:     2.28 cycles/op
      add_mfence:    97.28 cycles/op
         lockadd:    96.07 cycles/op
            xadd:    96.23 cycles/op
            swap:    94.17 cycles/op
         cmpxchg:    96.80 cycles/op
 lockadd_unalign:   619.27 cycles/op
interference type: three_cores_write_line
             add:     4.29 cycles/op
      add_mfence:   154.34 cycles/op
         lockadd:    19.27 cycles/op
            xadd:    19.15 cycles/op
            swap:    19.36 cycles/op
         cmpxchg:    19.01 cycles/op
 lockadd_unalign:   645.76 cycles/op

@djg
Copy link

djg commented Jul 29, 2014

Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz

interference type: none
             add:     1.77 cycles/op
      add_mfence:    46.18 cycles/op
         lockadd:    19.43 cycles/op
            xadd:    20.21 cycles/op
            swap:    22.23 cycles/op
         cmpxchg:    19.70 cycles/op
 lockadd_unalign:   718.28 cycles/op
interference type: hyperthread_read_line
             add:     1.78 cycles/op
      add_mfence:    46.32 cycles/op
         lockadd:    18.03 cycles/op
            xadd:    18.06 cycles/op
            swap:    21.55 cycles/op
         cmpxchg:    17.87 cycles/op
 lockadd_unalign:   650.22 cycles/op
interference type: hyperthread_write_line
             add:    10.04 cycles/op
      add_mfence:    50.16 cycles/op
         lockadd:    47.20 cycles/op
            xadd:    39.77 cycles/op
            swap:    30.49 cycles/op
         cmpxchg:   112.13 cycles/op
 lockadd_unalign:   813.49 cycles/op
interference type: other_core_read_line
             add:     2.01 cycles/op
      add_mfence:   102.74 cycles/op
         lockadd:   105.58 cycles/op
            xadd:   107.39 cycles/op
            swap:   108.68 cycles/op
         cmpxchg:   104.17 cycles/op
 lockadd_unalign:   707.72 cycles/op
interference type: other_core_write_line
             add:     3.11 cycles/op
      add_mfence:   175.14 cycles/op
         lockadd:   114.47 cycles/op
            xadd:   119.80 cycles/op
            swap:   117.18 cycles/op
         cmpxchg:    28.35 cycles/op
 lockadd_unalign:   721.09 cycles/op
interference type: three_cores_read_line
             add:     2.10 cycles/op
      add_mfence:   121.41 cycles/op
         lockadd:   162.92 cycles/op
            xadd:   162.67 cycles/op
            swap:   156.57 cycles/op
         cmpxchg:   151.76 cycles/op
 lockadd_unalign:   722.32 cycles/op
interference type: three_cores_write_line
             add:     7.46 cycles/op
      add_mfence:   313.86 cycles/op
         lockadd:   238.67 cycles/op
            xadd:   237.84 cycles/op
            swap:   238.59 cycles/op
         cmpxchg:  2319.80 cycles/op
 lockadd_unalign:   827.18 cycles/op

@zzzoom
Copy link

zzzoom commented Jul 29, 2014

i5 4670K, stock clocks, turbos to 4GHz on all 4 cores during run:

interference type: none
             add:     1.28 cycles/op
      add_mfence:    39.19 cycles/op
         lockadd:    20.43 cycles/op
            xadd:    19.13 cycles/op
            swap:    19.76 cycles/op
         cmpxchg:    19.13 cycles/op
 lockadd_unalign:   574.70 cycles/op
interference type: hyperthread_read_line
             add:     1.64 cycles/op
      add_mfence:   102.07 cycles/op
         lockadd:    69.59 cycles/op
            xadd:    19.17 cycles/op
            swap:    19.80 cycles/op
         cmpxchg:    19.27 cycles/op
 lockadd_unalign:   574.58 cycles/op
interference type: hyperthread_write_line
             add:     1.28 cycles/op
      add_mfence:    39.19 cycles/op
         lockadd:    20.43 cycles/op
            xadd:    19.13 cycles/op
            swap:    19.76 cycles/op
         cmpxchg:    19.13 cycles/op
 lockadd_unalign:   575.42 cycles/op
interference type: other_core_read_line
             add:     1.28 cycles/op
      add_mfence:    39.29 cycles/op
         lockadd:    20.44 cycles/op
            xadd:    19.16 cycles/op
            swap:    19.80 cycles/op
         cmpxchg:    19.14 cycles/op
 lockadd_unalign:   575.71 cycles/op
interference type: other_core_write_line
             add:     1.28 cycles/op
      add_mfence:    39.19 cycles/op
         lockadd:    20.40 cycles/op
            xadd:    19.16 cycles/op
            swap:    19.76 cycles/op
         cmpxchg:    19.13 cycles/op
 lockadd_unalign:   578.42 cycles/op
interference type: three_cores_read_line
             add:     1.28 cycles/op
      add_mfence:    39.20 cycles/op
         lockadd:    20.44 cycles/op
            xadd:    19.16 cycles/op
            swap:    19.80 cycles/op
         cmpxchg:    19.16 cycles/op
 lockadd_unalign:   573.75 cycles/op
interference type: three_cores_write_line
             add:     2.66 cycles/op
      add_mfence:   130.16 cycles/op
         lockadd:    97.48 cycles/op
            xadd:   118.87 cycles/op
            swap:    66.55 cycles/op
         cmpxchg:   270.15 cycles/op
 lockadd_unalign:   575.15 cycles/op

@kriolyth
Copy link

Intel Core i7-920 (Bloomfield) @ 2.8 GHz (during execution)

interference type: none
             add:     2.74 cycles/op
      add_mfence:    43.10 cycles/op
         lockadd:    20.63 cycles/op
            xadd:    20.70 cycles/op
            swap:    20.72 cycles/op
         cmpxchg:    17.74 cycles/op
 lockadd_unalign:  1218.97 cycles/op
interference type: hyperthread_read_line
             add:     2.05 cycles/op
      add_mfence:    43.85 cycles/op
         lockadd:    21.61 cycles/op
            xadd:    21.43 cycles/op
            swap:    20.04 cycles/op
         cmpxchg:    20.89 cycles/op
 lockadd_unalign:  1216.35 cycles/op
interference type: hyperthread_write_line
             add:     4.58 cycles/op
      add_mfence:    36.71 cycles/op
         lockadd:    22.11 cycles/op
            xadd:    21.99 cycles/op
            swap:    22.36 cycles/op
         cmpxchg:    44.44 cycles/op
 lockadd_unalign:  1236.09 cycles/op
interference type: other_core_read_line
             add:     3.65 cycles/op
      add_mfence:   156.83 cycles/op
         lockadd:    85.56 cycles/op
            xadd:    83.32 cycles/op
            swap:    83.72 cycles/op
         cmpxchg:   141.47 cycles/op
 lockadd_unalign:  1216.30 cycles/op
interference type: other_core_write_line
             add:     4.66 cycles/op
      add_mfence:   117.53 cycles/op
         lockadd:    63.93 cycles/op
            xadd:    61.09 cycles/op
            swap:    61.17 cycles/op
         cmpxchg:   145.92 cycles/op
 lockadd_unalign:  1224.23 cycles/op
interference type: three_cores_read_line
             add:     3.71 cycles/op
      add_mfence:   227.06 cycles/op
         lockadd:   138.89 cycles/op
            xadd:   130.12 cycles/op
            swap:   133.46 cycles/op
         cmpxchg:   214.04 cycles/op
 lockadd_unalign:  1234.76 cycles/op
interference type: three_cores_write_line
             add:     5.51 cycles/op
      add_mfence:   211.07 cycles/op
         lockadd:   121.84 cycles/op
            xadd:   120.34 cycles/op
            swap:   119.18 cycles/op
         cmpxchg:   227.27 cycles/op
 lockadd_unalign:  1265.16 cycles/op

@dougbinks
Copy link

Intel Core i7-2677M @1.80 GHz 2C4T (Sandy Bridge, so rdtsc fixed freq despite turbo. Didn't set bios to disable turbo etc. for fixed freq so results variable with freq.)

interference type: none
             add:     1.51 cycles/op
      add_mfence:    35.09 cycles/op
         lockadd:    15.48 cycles/op
            xadd:    15.32 cycles/op
            swap:    15.78 cycles/op
         cmpxchg:    15.99 cycles/op
 lockadd_unalign:   410.44 cycles/op
interference type: hyperthread_read_line
             add:     1.49 cycles/op
      add_mfence:    34.68 cycles/op
         lockadd:    15.48 cycles/op
            xadd:    21.78 cycles/op
            swap:    24.98 cycles/op
         cmpxchg:    15.56 cycles/op
 lockadd_unalign:   450.96 cycles/op
interference type: hyperthread_write_line
             add:     1.51 cycles/op
      add_mfence:    35.05 cycles/op
         lockadd:    15.50 cycles/op
            xadd:    15.53 cycles/op
            swap:    15.81 cycles/op
         cmpxchg:    15.94 cycles/op
 lockadd_unalign:   408.41 cycles/op
interference type: other_core_read_line
             add:     1.51 cycles/op
      add_mfence:    35.07 cycles/op
         lockadd:    15.48 cycles/op
            xadd:    15.33 cycles/op
            swap:    15.78 cycles/op
         cmpxchg:    15.98 cycles/op
 lockadd_unalign:   408.03 cycles/op
interference type: other_core_write_line
             add:     3.35 cycles/op
      add_mfence:   118.20 cycles/op
         lockadd:   100.99 cycles/op
            xadd:   103.57 cycles/op
            swap:   106.19 cycles/op
         cmpxchg:   251.31 cycles/op
 lockadd_unalign:   405.65 cycles/op
interference type: three_cores_read_line
             add:     1.85 cycles/op
      add_mfence:    70.64 cycles/op
         lockadd:    73.13 cycles/op
            xadd:    73.48 cycles/op
            swap:    68.97 cycles/op
         cmpxchg:    71.28 cycles/op
 lockadd_unalign:   439.66 cycles/op
interference type: three_cores_write_line
             add:     3.36 cycles/op
      add_mfence:   121.96 cycles/op
         lockadd:    90.12 cycles/op
            xadd:    91.66 cycles/op
            swap:    90.41 cycles/op
         cmpxchg:   230.79 cycles/op
 lockadd_unalign:   405.44 cycles/op

@dougbinks
Copy link

Second run:

Intel Core i7-2677M @1.80 GHz 2C4T (Sandy Bridge, so rdtsc fixed freq despite turbo. Didn't set bios to disable turbo etc. for fixed freq so results variable with freq.).

interference type: none
             add:     1.51 cycles/op
      add_mfence:    34.79 cycles/op
         lockadd:    15.64 cycles/op
            xadd:    15.53 cycles/op
            swap:    15.85 cycles/op
         cmpxchg:    15.95 cycles/op
 lockadd_unalign:   418.83 cycles/op
interference type: hyperthread_read_line
             add:     1.51 cycles/op
      add_mfence:    34.72 cycles/op
         lockadd:    19.57 cycles/op
            xadd:    15.51 cycles/op
            swap:    15.94 cycles/op
         cmpxchg:    16.09 cycles/op
 lockadd_unalign:   426.44 cycles/op
interference type: hyperthread_write_line
             add:     3.44 cycles/op
      add_mfence:    37.77 cycles/op
         lockadd:    17.14 cycles/op
            xadd:    19.51 cycles/op
            swap:    20.60 cycles/op
         cmpxchg:    18.61 cycles/op
 lockadd_unalign:   411.84 cycles/op
interference type: other_core_read_line
             add:     1.78 cycles/op
      add_mfence:    71.25 cycles/op
         lockadd:    15.63 cycles/op
            xadd:    15.39 cycles/op
            swap:    18.04 cycles/op
         cmpxchg:    65.87 cycles/op
 lockadd_unalign:   408.33 cycles/op
interference type: other_core_write_line
             add:     3.90 cycles/op
      add_mfence:   121.43 cycles/op
         lockadd:    16.01 cycles/op
            xadd:    16.13 cycles/op
            swap:    15.80 cycles/op
         cmpxchg:    15.56 cycles/op
 lockadd_unalign:   407.19 cycles/op
interference type: three_cores_read_line
             add:     1.85 cycles/op
      add_mfence:    71.31 cycles/op
         lockadd:    69.29 cycles/op
            xadd:    69.65 cycles/op
            swap:    69.88 cycles/op
         cmpxchg:    66.89 cycles/op
 lockadd_unalign:   404.38 cycles/op
interference type: three_cores_write_line
             add:     3.46 cycles/op
      add_mfence:    37.87 cycles/op
         lockadd:    35.72 cycles/op
            xadd:    23.67 cycles/op
            swap:    15.59 cycles/op
         cmpxchg:    15.61 cycles/op
 lockadd_unalign:   430.37 cycles/op

@Spasi
Copy link

Spasi commented Jul 29, 2014

i5-2400 (3.10GHz Sandy Bridge)

interference type: none
             add:     1.69 cycles/op
      add_mfence:    44.08 cycles/op
         lockadd:    25.19 cycles/op
            xadd:    25.19 cycles/op
            swap:    24.46 cycles/op
         cmpxchg:    25.19 cycles/op
 lockadd_unalign:   615.60 cycles/op
interference type: hyperthread_read_line
             add:     2.23 cycles/op
      add_mfence:    66.85 cycles/op
         lockadd:    25.19 cycles/op
            xadd:    25.19 cycles/op
            swap:    24.50 cycles/op
         cmpxchg:    87.92 cycles/op
 lockadd_unalign:   613.58 cycles/op
interference type: hyperthread_write_line
             add:     1.69 cycles/op
      add_mfence:    44.73 cycles/op
         lockadd:    25.19 cycles/op
            xadd:    25.19 cycles/op
            swap:    24.46 cycles/op
         cmpxchg:    25.19 cycles/op
 lockadd_unalign:   614.62 cycles/op
interference type: other_core_read_line
             add:     2.22 cycles/op
      add_mfence:    44.56 cycles/op
         lockadd:    25.19 cycles/op
            xadd:    25.22 cycles/op
            swap:    24.46 cycles/op
         cmpxchg:    25.23 cycles/op
 lockadd_unalign:   614.30 cycles/op
interference type: other_core_write_line
             add:     1.69 cycles/op
      add_mfence:    44.63 cycles/op
         lockadd:   159.29 cycles/op
            xadd:    25.19 cycles/op
            swap:    24.46 cycles/op
         cmpxchg:    25.19 cycles/op
 lockadd_unalign:   614.38 cycles/op
interference type: three_cores_read_line
             add:     2.23 cycles/op
      add_mfence:   113.23 cycles/op
         lockadd:   108.38 cycles/op
            xadd:   108.40 cycles/op
            swap:   108.28 cycles/op
         cmpxchg:    25.19 cycles/op
 lockadd_unalign:   612.58 cycles/op
interference type: three_cores_write_line
             add:     3.56 cycles/op
      add_mfence:    44.53 cycles/op
         lockadd:    25.90 cycles/op
            xadd:   160.24 cycles/op
            swap:   142.26 cycles/op
         cmpxchg:    25.23 cycles/op
 lockadd_unalign:   616.72 cycles/op

@Kuranes
Copy link

Kuranes commented Jul 29, 2014

AMD FX 8350 (8core)

interference type: none               
             add:     2.00 cycles/op  
      add_mfence:    94.95 cycles/op  
         lockadd:    43.18 cycles/op  
            xadd:    42.00 cycles/op  
            swap:    42.18 cycles/op  
         cmpxchg:    45.94 cycles/op  
 lockadd_unalign:   216.35 cycles/op  
interference type: hyperthread_read_li
             add:     2.00 cycles/op  
      add_mfence:    98.13 cycles/op  
         lockadd:    95.56 cycles/op  
            xadd:    93.84 cycles/op  
            swap:    93.82 cycles/op  
         cmpxchg:    94.22 cycles/op  
 lockadd_unalign:   274.09 cycles/op  
interference type: hyperthread_write_l
             add:    25.08 cycles/op  
      add_mfence:   177.02 cycles/op  
         lockadd:   142.13 cycles/op  
            xadd:   149.08 cycles/op  
            swap:   150.16 cycles/op  
         cmpxchg:  3697.73 cycles/op  
 lockadd_unalign:   259.23 cycles/op  
interference type: other_core_read_lin
             add:     6.42 cycles/op  
      add_mfence:   393.58 cycles/op  
         lockadd:   216.35 cycles/op  
            xadd:   391.48 cycles/op  
            swap:    42.05 cycles/op  
         cmpxchg:    45.93 cycles/op  
 lockadd_unalign:   213.56 cycles/op  
interference type: other_core_write_li
             add:     2.00 cycles/op  
      add_mfence:   473.65 cycles/op  
         lockadd:   387.04 cycles/op  
            xadd:   378.94 cycles/op  
            swap:   396.20 cycles/op  
         cmpxchg:   942.90 cycles/op  
 lockadd_unalign:   655.72 cycles/op  
interference type: three_cores_read_li
             add:    13.53 cycles/op  
      add_mfence:   835.01 cycles/op  
         lockadd:   443.42 cycles/op  
            xadd:   580.89 cycles/op  
            swap:   834.23 cycles/op  
         cmpxchg:  1048.45 cycles/op  
 lockadd_unalign:   968.52 cycles/op  
interference type: three_cores_write_l
             add:    82.73 cycles/op  
      add_mfence:   905.74 cycles/op  
         lockadd:   825.68 cycles/op  
            xadd:   844.77 cycles/op  
            swap:   827.17 cycles/op  
         cmpxchg:  2491.23 cycles/op  
 lockadd_unalign:  1140.24 cycles/op  

@maxburke
Copy link

Atom D510 (1.66GHz, 2 core, 4 thread) in Linux (port here: https://github.com/maxburke/atomic_ops_test):

interference type: none
             add:     2.88 cycles/op
      add_mfence:     5.75 cycles/op
         lockadd:     2.88 cycles/op
            xadd:     7.83 cycles/op
            swap:     7.67 cycles/op
         cmpxchg:    21.50 cycles/op
 lockadd_unalign:   181.44 cycles/op
interference type: hyperthread_read_line
             add:     2.88 cycles/op
      add_mfence:     5.75 cycles/op
         lockadd:     2.88 cycles/op
            xadd:     7.83 cycles/op
            swap:     7.67 cycles/op
         cmpxchg:    21.50 cycles/op
 lockadd_unalign:   181.41 cycles/op
interference type: hyperthread_write_line
             add:     5.00 cycles/op
      add_mfence:     4.63 cycles/op
         lockadd:     5.00 cycles/op
            xadd:     7.75 cycles/op
            swap:     7.25 cycles/op
         cmpxchg:    24.02 cycles/op
 lockadd_unalign:   180.20 cycles/op
interference type: other_core_read_line
             add:     2.88 cycles/op
      add_mfence:     5.75 cycles/op
         lockadd:     2.88 cycles/op
            xadd:     7.83 cycles/op
            swap:     7.67 cycles/op
         cmpxchg:    21.50 cycles/op
 lockadd_unalign:   181.40 cycles/op
interference type: other_core_write_line
             add:    36.12 cycles/op
      add_mfence:    70.96 cycles/op
         lockadd:    35.50 cycles/op
            xadd:    96.41 cycles/op
            swap:    68.43 cycles/op
         cmpxchg:   348.55 cycles/op
 lockadd_unalign:   209.81 cycles/op
interference type: three_cores_read_line
             add:     2.88 cycles/op
      add_mfence:     5.75 cycles/op
         lockadd:     2.88 cycles/op
            xadd:     7.83 cycles/op
            swap:     7.67 cycles/op
         cmpxchg:    21.50 cycles/op
 lockadd_unalign:   185.16 cycles/op
interference type: three_cores_write_line
             add:    36.19 cycles/op
      add_mfence:    71.41 cycles/op
         lockadd:    36.28 cycles/op
            xadd:    96.52 cycles/op
            swap:    68.35 cycles/op
         cmpxchg:   344.29 cycles/op
 lockadd_unalign:   209.89 cycles/op

*Port uses rdtsc instead of rdtscp, but the D510 doesn't appear to have rdtscp. I don't think it should have too much of an effect as the processor is in-order.

@spayne
Copy link

spayne commented Jul 29, 2014

i7-3930K CPU @ 3.20Ghz

interference type: none
             add:     1.91 cycles/op
      add_mfence:    45.98 cycles/op
         lockadd:    20.51 cycles/op
            xadd:    20.71 cycles/op
            swap:    20.71 cycles/op
         cmpxchg:    20.50 cycles/op
 lockadd_unalign:  1469.86 cycles/op
interference type: hyperthread_read_line
             add:     1.94 cycles/op
      add_mfence:    45.98 cycles/op
         lockadd:    29.69 cycles/op
            xadd:    29.22 cycles/op
            swap:    61.40 cycles/op
         cmpxchg:    39.95 cycles/op
 lockadd_unalign:  1420.95 cycles/op
interference type: hyperthread_write_line
             add:     6.07 cycles/op
      add_mfence:    50.24 cycles/op
         lockadd:    21.62 cycles/op
            xadd:    22.27 cycles/op
            swap:    50.25 cycles/op
         cmpxchg:   111.57 cycles/op
 lockadd_unalign:  1489.86 cycles/op
interference type: other_core_read_line
             add:     2.34 cycles/op
      add_mfence:   146.13 cycles/op
         lockadd:   140.78 cycles/op
            xadd:   142.04 cycles/op
            swap:   132.71 cycles/op
         cmpxchg:   144.98 cycles/op
 lockadd_unalign:  1501.67 cycles/op
interference type: other_core_write_line
             add:     4.65 cycles/op
      add_mfence:   206.75 cycles/op
         lockadd:   160.12 cycles/op
            xadd:   162.33 cycles/op
            swap:   145.06 cycles/op
         cmpxchg:   349.43 cycles/op
 lockadd_unalign:  1506.68 cycles/op
interference type: three_cores_read_line
             add:     2.44 cycles/op
      add_mfence:   161.34 cycles/op
         lockadd:   162.22 cycles/op
            xadd:   162.29 cycles/op
            swap:   151.51 cycles/op
         cmpxchg:   163.30 cycles/op
 lockadd_unalign:  1514.29 cycles/op
interference type: three_cores_write_line
             add:    10.39 cycles/op
      add_mfence:   423.03 cycles/op
         lockadd:   382.45 cycles/op
            xadd:   389.20 cycles/op
            swap:   325.26 cycles/op
         cmpxchg:  2575.06 cycles/op
 lockadd_unalign:  1698.24 cycles/op

@darksylinc
Copy link

Intel Core 2 Quad Extreme QX9650 @3ghz
Yorkfield
8GB DDR2 (4 sticks, dual channel, running at 333 Mhz 5-5-5-15)

interference type: none
             add:     1.50 cycles/op
      add_mfence:    14.00 cycles/op
         lockadd:    20.13 cycles/op
            xadd:    20.01 cycles/op
            swap:    18.38 cycles/op
         cmpxchg:    35.54 cycles/op
 lockadd_unalign:   280.44 cycles/op
interference type: hyperthread_read_line
             add:     1.50 cycles/op
      add_mfence:    14.01 cycles/op
         lockadd:    20.17 cycles/op
            xadd:    20.01 cycles/op
            swap:    18.59 cycles/op
         cmpxchg:    36.61 cycles/op
 lockadd_unalign:   274.59 cycles/op
interference type: hyperthread_write_line
             add:     9.67 cycles/op
      add_mfence:    14.00 cycles/op
         lockadd:    20.13 cycles/op
            xadd:    20.02 cycles/op
            swap:    18.38 cycles/op
         cmpxchg:    42.39 cycles/op
 lockadd_unalign:   316.51 cycles/op
interference type: other_core_read_line
             add:     1.04 cycles/op
      add_mfence:    14.00 cycles/op
         lockadd:    20.13 cycles/op
            xadd:    20.01 cycles/op
            swap:    18.38 cycles/op
         cmpxchg:    35.55 cycles/op
 lockadd_unalign:   292.92 cycles/op
interference type: other_core_write_line
             add:     1.62 cycles/op
      add_mfence:    14.00 cycles/op
         lockadd:    20.13 cycles/op
            xadd:    20.01 cycles/op
            swap:    18.38 cycles/op
         cmpxchg:    35.55 cycles/op
 lockadd_unalign:   342.45 cycles/op
interference type: three_cores_read_line
             add:     4.48 cycles/op
      add_mfence:    14.00 cycles/op
         lockadd:    20.13 cycles/op
            xadd:    20.05 cycles/op
            swap:    18.38 cycles/op
         cmpxchg:    35.55 cycles/op
 lockadd_unalign:   519.35 cycles/op
interference type: three_cores_write_line
             add:     1.62 cycles/op
      add_mfence:    14.14 cycles/op
         lockadd:   192.96 cycles/op
            xadd:   165.69 cycles/op
            swap:   211.02 cycles/op
         cmpxchg:   691.73 cycles/op
 lockadd_unalign:   435.48 cycles/op

@razor85
Copy link

razor85 commented Jul 29, 2014

@JHoule
Copy link

JHoule commented Jul 30, 2014

AMD A10-4600M (2.30GHz) with 8GB RAM

interference type: none
             add:     1.71 cycles/op
  dependent_adds:     0.85 cycles/op
      add_mfence:    81.16 cycles/op
         lockadd:    36.80 cycles/op
            xadd:    35.96 cycles/op
            swap:    35.95 cycles/op
         cmpxchg:    39.15 cycles/op
 lockadd_unalign:   173.03 cycles/op
interference type: hyperthread_read_line
             add:     1.73 cycles/op
  dependent_adds:     0.85 cycles/op
      add_mfence:    84.60 cycles/op
         lockadd:    80.40 cycles/op
            xadd:    61.32 cycles/op
            swap:    80.49 cycles/op
         cmpxchg:    81.13 cycles/op
 lockadd_unalign:   212.81 cycles/op
interference type: hyperthread_write_line
             add:    20.24 cycles/op
  dependent_adds:     2.70 cycles/op
      add_mfence:   150.56 cycles/op
         lockadd:   120.42 cycles/op
            xadd:   126.89 cycles/op
            swap:    48.89 cycles/op
         cmpxchg:    46.72 cycles/op
 lockadd_unalign:   254.07 cycles/op
interference type: other_core_read_line
             add:     5.24 cycles/op
  dependent_adds:     0.85 cycles/op
      add_mfence:   302.17 cycles/op
         lockadd:    48.86 cycles/op
            xadd:    48.45 cycles/op
            swap:    45.28 cycles/op
         cmpxchg:    52.77 cycles/op
 lockadd_unalign:   385.89 cycles/op
interference type: other_core_write_line
             add:     2.46 cycles/op
  dependent_adds:     5.90 cycles/op
      add_mfence:   321.18 cycles/op
         lockadd:    37.93 cycles/op
            xadd:    48.45 cycles/op
            swap:    47.70 cycles/op
         cmpxchg:    43.81 cycles/op
 lockadd_unalign:   272.36 cycles/op
interference type: three_cores_read_line
             add:     5.26 cycles/op
  dependent_adds:     0.85 cycles/op
      add_mfence:   323.13 cycles/op
         lockadd:   270.49 cycles/op
            xadd:    50.26 cycles/op
            swap:   305.33 cycles/op
         cmpxchg:    52.17 cycles/op
 lockadd_unalign:   443.51 cycles/op
interference type: three_cores_write_line
             add:    32.33 cycles/op
  dependent_adds:     0.85 cycles/op
      add_mfence:   251.92 cycles/op
         lockadd:   133.34 cycles/op
            xadd:    50.48 cycles/op
            swap:   400.78 cycles/op
         cmpxchg:    52.03 cycles/op
 lockadd_unalign:   597.53 cycles/op

@maxburke
Copy link

AMD Opteron 6272 (16 module @ 2.1GHz)

interference type: none
             add:     2.19 cycles/op
      add_mfence:    95.01 cycles/op
         lockadd:    56.15 cycles/op
            xadd:    53.15 cycles/op
            swap:    53.14 cycles/op
         cmpxchg:    54.68 cycles/op
 lockadd_unalign:   247.77 cycles/op
interference type: hyperthread_read_line
             add:     2.50 cycles/op
      add_mfence:   100.42 cycles/op
         lockadd:   138.72 cycles/op
            xadd:    97.43 cycles/op
            swap:   108.15 cycles/op
         cmpxchg:   143.43 cycles/op
 lockadd_unalign:   293.73 cycles/op
interference type: hyperthread_write_line
             add:    36.76 cycles/op
      add_mfence:   171.70 cycles/op
         lockadd:   226.86 cycles/op
            xadd:   222.19 cycles/op
            swap:   226.44 cycles/op
         cmpxchg:  4643.72 cycles/op
 lockadd_unalign:   348.22 cycles/op
interference type: other_core_read_line
             add:    86.59 cycles/op
      add_mfence:   548.38 cycles/op
         lockadd:   453.45 cycles/op
            xadd:   435.06 cycles/op
            swap:   453.95 cycles/op
         cmpxchg:   502.23 cycles/op
 lockadd_unalign:   758.78 cycles/op
interference type: other_core_write_line
             add:    75.59 cycles/op
      add_mfence:   421.52 cycles/op
         lockadd:   445.01 cycles/op
            xadd:   416.42 cycles/op
            swap:   405.71 cycles/op
         cmpxchg:    54.79 cycles/op
 lockadd_unalign:   687.30 cycles/op
interference type: three_cores_read_line
             add:   186.46 cycles/op
      add_mfence:   921.73 cycles/op
         lockadd:   767.26 cycles/op
            xadd:   752.58 cycles/op
            swap:   769.43 cycles/op
         cmpxchg:   885.06 cycles/op
 lockadd_unalign:  1240.31 cycles/op
interference type: three_cores_write_line
             add:   214.71 cycles/op
      add_mfence:  1095.98 cycles/op
         lockadd:   875.85 cycles/op
            xadd:   865.22 cycles/op
            swap:   887.61 cycles/op
         cmpxchg:  8397.39 cycles/op
 lockadd_unalign:  1399.55 cycles/op

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment