Skip to content

Instantly share code, notes, and snippets.

@edcote
Last active July 31, 2018 20:26
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save edcote/8c44f6bb943b7c4a9d2feeebee65110a to your computer and use it in GitHub Desktop.
Save edcote/8c44f6bb943b7c4a9d2feeebee65110a to your computer and use it in GitHub Desktop.
Primer on Memory Consistency and Cache Coherence

Chapter 1

Consistency models define correct shared memory behavior in terms of loads and stores without references to caches or coherence.

Chapter 2

  1. Single-Writer, Multiple-Read (Invariant): For any memory location A, at any given time, there exists only a single core that may write to A (and can also read it) ot some number of cores that may only read A.
  2. Data-Value Invariant: The value of the memory location at the start of an epoch is the same as the value of the memory location at the end of its last read-write epoch.

Chapter 3 - Memory Consistency Motivation and Sequential Consistency

  • How a core may reorder memory access
  • Store-store reordering: If core has a non-FIFO write buffer that lets stores depart in a different order in which they entered. This might occur if the first store misses, but the second store hits. Or, if the second store can coalesce with an earlier store (?). These reorderings are possible even if the core executes all instructions in program order. Reordering stores to different memory addresses has no effect on a single-threaded execution. Multi threaded example differs.

  • Load-load reordering: Modern cores may execute instructions out of order. C2 (above) could execute L1 and L2 out of order.

Core C1
S1: Store data = NEW
S2: Store flag = SET

Core C2
L1: Load r1 = flag
B2: if (r1 != SET) goto L1
L2: Load r2 = data

; initially data = 0, flag  != SET
; L1, B1 may repeat many times
; show R2 always be set to NEW?
  • Load-store and store-load reordering: OoO cores may reorder loads and stores (different addresses) from the same thread. Reordering an earlier load with a later store (load-store reordering) can cause issues related to locks, etc. In store-load reordering below, r1 and r2 could equal store with store-load reordering.
Core C1
S1: x = NEW
L1: r1 = y

Core C2
S2: y = NEW
L2: r2 = x

3.2 What is a memory consistency model

  • defines what behaviors a programmer can expect
  • what optimizations system implementors can use

3.4 Basic Idea of Sequential Consistency

"The result of an execution is the same as if the opertions had been executed in the order specified by the program".

Memory order respects each core's program order.

3.10 Putting It All Together: MIPS R10000

R10000 core speculatively issues loads and stores in program order into an address queue. Loads and stores commit in program order and then remove their address queue entries. To commit a store, the L1 cache must hold the block in M state and the store's value must be written atomically with the commit.

The eviction of a cache block that contains a load's address in the AQ squashes te load of all subsequent instructions which then re-execute.

Chapter 4 TSO

Relax store-load ordering.

Processor cores use store buffers to hold committed (retired) stores until the rest of the memory system can process the store. A store can enter the store buffer before the cache has obtained read-write coherence permissions.

A write buffer can be made architecturally invisible by ensuring that a load to address A returns the value of the most reset store to A, even if one or more stores are in the write buffer. This is accomplished using by bypassing the value of the most recent store....

4.3 A Little TSO Formalism

A TSO execution requires:

  1. All cores inser their loads and stores into the memory order respecting their program order, regardless whether they are to the same or different memory addresses.

  2. Every load gets its value from teh last store before it to the same address.

  3. Consider FENCE instructions.

Implementing TSO

The implementation of TSO is similar to SC with the addition of per-core FIFO write buffers

  • Loads and stores leave each core in that core's program order.
  • A load either bypasses a value from the write buffer or await the switch as before
  • A store enters the tail of the FIFO write buffer or stalls if full
  • When the "switch" selects, is performs either the next load or the store at the head of the write buffer.

Chapter 5 Relaxed Memory Consistency

Relax all orderings.

5.1.2 Opportunities For Further Reordering

  • Non-FIFO, coalescing write buffer: coalescing (two non consequentive stores can write to the same entry in the buffer). Non-FIFO coalescing violates TSO before TSO requires stores to appear in program order.

  • Simpler support for speculation: Recall R10000 that has mechanism to replay falsely speculated mem ops. This is simplified.

  • Coupling consistency and coherence: May be possible for a core to read the old value of a store while order read the new --- temporary violation of SWMR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment