Skip to content

Instantly share code, notes, and snippets.

@nitrix
Last active January 5, 2021 16:58
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nitrix/19bab7c711d05811a4661a189d26bc19 to your computer and use it in GitHub Desktop.
Save nitrix/19bab7c711d05811a4661a189d26bc19 to your computer and use it in GitHub Desktop.
memory order atomics explanation
All atomic operations are guaranteed to be atomic within themselves (the
combination of two atomic operations is not atomic as a whole!) and to be
visible in the total order in which they appear on the timeline of the
execution stream. That means no atomic operation can, under any circumstances,
be reordered, but other memory operations might very well be. Compilers (and
CPUs) routinely do such reordering as an optimization.
It also means the compiler must use whatever instructions are necessary to
guarantee that an atomic operation executing at any time will see the results
of each and every other atomic operation, possibly on another processor core
(but not necessarily other operations), that were executed before.
Now, a relaxed is just that, the bare minimum. It does nothing in addition and
provides no other guarantees. It is the cheapest possible operation. For
non-read-modify-write operations on strongly ordered processor architectures
(e.g. x86/amd64) this boils down to a plain normal, ordinary move.
The sequentially consistent operation is the exact opposite, it enforces strict
ordering not only for atomic operations, but also for other memory operations
that happen before or after. Neither one can cross the barrier imposed by the
atomic operation. Practically, this means lost optimization opportunities, and
possibly fence instructions may have to be inserted. This is the most expensive
model.
A release operation prevents ordinary loads and stores from being reordered
after the atomic operation, whereas an acquire operation prevents ordinary
loads and stores from being reordered before the atomic operation. Everything
else can still be moved around.
The combination of preventing stores being moved after, and loads being moved
before the respective atomic operation makes sure that whatever the acquiring
thread gets to see is consistent, with only a small amount of optimization
opportunity lost.
One may think of that as something like a non-existent lock that is being
released (by the writer) and acquired (by the reader). Except... there is no
lock.
In practice, release/acquire usually means the compiler needs not use any
particularly expensive special instructions, but it cannot freely reorder loads
and stores to its liking, which may miss out some (small) optimization
opportuntities.
Finally, consume is the same operation as acquire, only with the exception that
the ordering guarantees only apply to dependent data. Dependent data would e.g.
be data that is pointed-to by an atomically modified pointer.
Arguably, that may provide for a couple of optimization opportunities that are
not present with acquire operations (since fewer data is subject to
restrictions), however this happens at the expense of more complex and more
error-prone code, and the non-trivial task of getting dependency chains
correct.
It is currently discouraged to use consume ordering while the specification is
being revised.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment