https://www.hpl.hp.com/techreports/2012/HPL-2012-68.pdf recommends a "read, don't modify, write" for maximal correctness. But does GCC/etc. optimize it out to a "release-consistent" read? This code is a test case, and compiles in godbolt --std=c++17 (-O2)
.
DocumentStore::get_document()
contains a busy loop, but is wait-free, since "# of times we fail to acquire mutex" is bounded by "# of times the GUI flips the buffers", which only occurs once per user input.
The audio thread's attempt to lock the front document can fail, if the GUI thread is swapping and locking documents at the same time.
- Audio get_document() reads "front_index == 0" (acquire)
- GUI "front_index = 1" (acq_rel)
- GUI "lock document 0" (acquire)
- Audio "try lock doc 0" fails.
If the GUI swaps documents once, the audio thread's second attempt to read the front document will succeed. We prove by contradiction.
The audio thread's second attempt to read the document may fail if:
- Audio get_document() reads "front_index == 0" (acq_rel).
This is impossible by contradiction:
- Audio get_document() "try lock doc 0", is sequenced before:
- Audio "read front_index == 0" (acq_rel) the second time, is sequenced before:
- GUI swap_docs() "front_index = 1", is sequenced before:
- GUI "lock doc 0".
So audio "try lock doc 0" is sequenced before GUI "lock doc 0" and cannot have failed. We have a contradiction.
Therefore if "try lock doc 0" fails, the audio thread must read "front_index == 1" the second time.
Assume:
- Audio get_document() reads "front_index == 0" (acquire)
- GUI "front_index = 1" (acq_rel)
- GUI "lock document 0" (acquire)
- Audio "try lock doc 0" fails.
- Audio get_document() read-don't-modify-write "front_index" (acq_rel). What will it see?
The audio thread will read "front_index == 1". Aside from proof by contradiction, I attempted a direct proof:
- GUI "front_index = 1" (acq_rel) is sequenced before:
- GUI "lock document 0" is sequenced before???
- Audio "try lock doc 0" is sequenced before:
- Audio "read front_index" (acq_rel) the second time.
What I need is a read with release semantics.
https://pitdicker.github.io/Writing-a-seqlock-in-Rust/ https://www.reddit.com/r/rust/comments/eez9s2/writing_a_seqlock_in_rust/ https://www.hpl.hp.com/techreports/2012/HPL-2012-68.pdf
there may be a use for "read-dont-modify-write" operations I don't know if my code is correct in the absence of such an operation :(
The PDF claims that fetch_add is the most theoretically elegant implementation. But it seems slow.
On Godbolt, fetch_add is left as a lock xadd
by gcc -O2 if I remove the try-lock mutex. If I leave in the try-lock mutex, the resulting asm is gibberish with piles and piles of call std::__throw_system_error(int)
. If I fetch_add by a nonzero constant, the resulting asm doesn't even contain the constant!
I will instead use atomic_thread_fence(m_o_acquire);
. This compiles to mfence + mov, even on -O0. See https://en.cppreference.com/w/cpp/atomic/atomic_thread_fence#Atomic-fence_synchronization for details on "acquire fence"s.