Showcases some interesting and non-obvious optimizations that compilers can make on and around atomics. In particular, I liked this example: the following code
int x = 0;
std::atomic<int> y;
int dso() {
x = 0;
int z = y.load(std::memory_order_seq_cst);
y.store(0, std::memory_order_seq_cst);
x = 1;
return z;
}
can be optimized to the following:
int x = 0;
std::atomic<int> y;
int dso() {
// Dead store eliminated.
int z = y.load(std::memory_order_seq_cst);
y.store(0, std::memory_order_seq_cst);
x = 1;
return z;
}
The first store to x can be dead-code eliminated because the only way to another thread could observe it is if that thread was race-y. That is, if the other thread read y and saw that it was 0, it would have no way of knowing if and when the subsequent store of 1 to x has occurred yet, thus it would also have no way of knowing if the initial store of 0 to x has been over-written. Thus, there is no way to reason about the state of x, thus the first store can be eliminated since it will eventually be overwritten.