Skip to content

Instantly share code, notes, and snippets.

@nyanpasu64
Last active January 19, 2020 22:11
Show Gist options
  • Save nyanpasu64/77af76dd491d6047a0a661e2aaa1e573 to your computer and use it in GitHub Desktop.
Save nyanpasu64/77af76dd491d6047a0a661e2aaa1e573 to your computer and use it in GitHub Desktop.
Do compilers optimize a "read, don't modify, write" atomic operation to a regular read?

https://www.hpl.hp.com/techreports/2012/HPL-2012-68.pdf recommends a "read, don't modify, write" for maximal correctness. But does GCC/etc. optimize it out to a "release-consistent" read? This code is a test case, and compiles in godbolt --std=c++17 (-O2).

Acquiring one of two mutexes wait-free

DocumentStore::get_document() contains a busy loop, but is wait-free, since "# of times we fail to acquire mutex" is bounded by "# of times the GUI flips the buffers", which only occurs once per user input.

Proof by contradiction

The audio thread's attempt to lock the front document can fail, if the GUI thread is swapping and locking documents at the same time.

  • Audio get_document() reads "front_index == 0" (acquire)
  • GUI "front_index = 1" (acq_rel)
  • GUI "lock document 0" (acquire)
  • Audio "try lock doc 0" fails.

If the GUI swaps documents once, the audio thread's second attempt to read the front document will succeed. We prove by contradiction.

The audio thread's second attempt to read the document may fail if:

  • Audio get_document() reads "front_index == 0" (acq_rel).

This is impossible by contradiction:

  • Audio get_document() "try lock doc 0", is sequenced before:
  • Audio "read front_index == 0" (acq_rel) the second time, is sequenced before:
  • GUI swap_docs() "front_index = 1", is sequenced before:
  • GUI "lock doc 0".

So audio "try lock doc 0" is sequenced before GUI "lock doc 0" and cannot have failed. We have a contradiction.

Therefore if "try lock doc 0" fails, the audio thread must read "front_index == 1" the second time.

Forward proof

Assume:

  • Audio get_document() reads "front_index == 0" (acquire)
  • GUI "front_index = 1" (acq_rel)
  • GUI "lock document 0" (acquire)
  • Audio "try lock doc 0" fails.
  • Audio get_document() read-don't-modify-write "front_index" (acq_rel). What will it see?

The audio thread will read "front_index == 1". Aside from proof by contradiction, I attempted a direct proof:

  • GUI "front_index = 1" (acq_rel) is sequenced before:
  • GUI "lock document 0" is sequenced before???
  • Audio "try lock doc 0" is sequenced before:
  • Audio "read front_index" (acq_rel) the second time.

Prior art and performance

What I need is a read with release semantics.

https://pitdicker.github.io/Writing-a-seqlock-in-Rust/ https://www.reddit.com/r/rust/comments/eez9s2/writing_a_seqlock_in_rust/ https://www.hpl.hp.com/techreports/2012/HPL-2012-68.pdf

there may be a use for "read-dont-modify-write" operations I don't know if my code is correct in the absence of such an operation :(

Speed

The PDF claims that fetch_add is the most theoretically elegant implementation. But it seems slow.

On Godbolt, fetch_add is left as a lock xadd by gcc -O2 if I remove the try-lock mutex. If I leave in the try-lock mutex, the resulting asm is gibberish with piles and piles of call std::__throw_system_error(int). If I fetch_add by a nonzero constant, the resulting asm doesn't even contain the constant!

I will instead use atomic_thread_fence(m_o_acquire);. This compiles to mfence + mov, even on -O0. See https://en.cppreference.com/w/cpp/atomic/atomic_thread_fence#Atomic-fence_synchronization for details on "acquire fence"s.

namespace doc {
struct Document {
int filler[20];
};
}
namespace command {
struct Command {};
}
#include <memory>
#include <mutex>
#include <optional>
namespace sync {
template<typename Ptr>
class [[nodiscard]] Guard {
std::unique_lock<std::mutex> _guard;
Ptr _value;
private:
Guard(std::mutex & mutex, Ptr value) : _guard{mutex}, _value{value} {
_guard.lock();
}
public:
static Guard make(std::mutex & mutex, Ptr value) {
Guard guard{mutex, value};
guard._guard.lock();
return guard;
}
static std::optional<Guard> try_make(std::mutex & mutex, Ptr value) {
Guard guard{mutex, value};
guard._guard.try_lock();
if (guard._guard) {
return guard;
} else {
return {};
}
}
explicit operator bool() const noexcept {
return _guard.operator bool();
}
typename std::pointer_traits<Ptr>::element_type & operator*() {
return *_value;
}
Ptr operator->() {
return _value;
}
};
/// A fake rwlock for two threads, where only one edits the contents.
/// Implemented as a single mutex only held by one thread.
///
/// Only the GUI thread can legally obtain exclusive references (when writing),
/// by acquiring the mutex first.
/// The GUI thread can obtain shared references (when reading) without acquiring the mutex!
/// The audio thread can obtain a *shared* reference by acquiring the mutex.
///
/// Should be faster than a true rwlock,
/// but doesn't generalize to >2 threads or >1 writer thread.
template<typename T>
class FakeRwLock {
mutable std::mutex _mutex;
T _value;
public:
explicit FakeRwLock(T && value) : _value(std::move(value)) {}
using ReadPtr = T const *;
using WritePtr = T *;
using ReadGuard = Guard<ReadPtr>;
using WriteGuard = Guard<WritePtr>;
/// Only call this in the same thread which calls gui_write().
ReadPtr gui_read() const {
return &_value;
}
/// Only call this in the GUI thread.
WriteGuard gui_write() {
return WriteGuard::make(_mutex, &_value);
}
/// Can be safely called in the audio thread.
std::optional<ReadGuard> try_read() const {
return ReadGuard::try_make(_mutex, &_value);
}
};
}
#include <array>
#include <atomic>
#include <memory>
#include <vector>
namespace gui {
namespace history {
struct Success {
bool succeeded;
};
/// A class storing two Document instances as a double buffer.
/// This allows the audio thread to fetch the current Document wait-free,
/// and GUI thread to edit document (may block),
/// but not at the same time.
class DocumentStore {
using LockedDoc = sync::FakeRwLock<doc::Document>;
// Only ever 0 or 1.
std::atomic_uint8_t _front_index = 0;
// Indexed by _front_index or 1 - _front_index.
std::array<LockedDoc, 2> _documents;
public:
// document? rvalue reference? i have no fucking clue
// Copy? Move?
// https://www.codesynthesis.com/~boris/blog/2012/06/19/efficient-argument-passing-cxx11-part1/
// List initialization proceeds in left-to-right order.
// https://en.cppreference.com/w/cpp/language/list_initialization#Notes
explicit DocumentStore(doc::Document document);
// Public API:
/// Only call this in the GUI thread.
/// No locks are acquired.
LockedDoc::ReadPtr gui_get_document();
/// Can be safely called in the audio thread.
/// Locks are acquired, but this is hopefully wait-free (or close to it).
LockedDoc::ReadGuard get_document();
/// Called by History when the user edits the document.
/// Should not be called elsewhere.
///
/// Only called from the GUI thread.
void history_apply_change(command::Command command);
};
// namespaces
}
}
namespace gui {
namespace history {
DocumentStore::DocumentStore(doc::Document document) : _documents{{
LockedDoc{doc::Document{document}}, LockedDoc{std::move(document)}
}} {}
DocumentStore::LockedDoc::ReadPtr DocumentStore::gui_get_document() {
return _documents[_front_index.load(std::memory_order_relaxed)].gui_read();
}
DocumentStore::LockedDoc::ReadGuard DocumentStore::get_document() {
// Attempted proof at https://docs.google.com/document/d/1Ipi7vfFXDV2TQIqgDqfuz9YHQvSMsCLa1zrJ_NQs6Rk/edit
// Based on a paper: https://www.hpl.hp.com/techreports/2012/HPL-2012-68.pdf
// Godbolt compilation testing at https://gist.github.com/jimbo1qaz/77af76dd491d6047a0a661e2aaa1e573.
// Note that I don't fully understand C++11 atomics, nor the x86 it compiles to.
// The below comments may not be fully correct?
while (true) {
auto which_doc = _front_index.load(std::memory_order_acquire);
auto maybe_doc = _documents[which_doc].try_read();
if (maybe_doc) {
return std::move(*maybe_doc);
}
// Prevent the next _front_index.load() from being reordered
// before the mutex try_read().
atomic_thread_fence(std::memory_order_acquire);
}
}
void DocumentStore::history_apply_change(command::Command command) {
auto gui_write_document = [&]() -> LockedDoc::WriteGuard {
return _documents[_front_index.load(std::memory_order_relaxed) ^ 1].gui_write();
};
auto swap_docs = [&]() -> void {
_front_index.fetch_xor(1, std::memory_order_acq_rel);
};
{
// Should not block.
LockedDoc::WriteGuard back_lock = gui_write_document();
// TODO apply command.redo(*back) or something
}
swap_docs();
{
// Blocks if audio thread is using the current back buffer
// (previous front buffer).
LockedDoc::WriteGuard back_lock = gui_write_document();
// TODO apply command.redo(back) or something
}
}
// namespaces
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment