Consistency models define correct shared memory behavior in terms of loads and stores without references to caches or coherence.
- Single-Writer, Multiple-Read (Invariant): For any memory location A, at any given time, there exists only a single core that may write to A (and can also read it) ot some number of cores that may only read A.
- Data-Value Invariant: The value of the memory location at the start of an epoch is the same as the value of the memory location at the end of its last read-write epoch.
- How a core may reorder memory access
-
Store-store reordering: If core has a non-FIFO write buffer that lets stores depart in a different order in which they entered. This might occur if the first store misses, but the second store hits. Or, if the second store can coalesce with an earlier store (?). These reorderings are possible even if the core executes all instructions in program order. Reordering stores to different memory addresses has no effect on a single-threaded execution. Multi threaded example differs.
-
Load-load reordering: Modern cores may execute instructions out of order. C2 (above) could execute L1 and L2 out of order.
Core C1
S1: Store data = NEW
S2: Store flag = SET
Core C2
L1: Load r1 = flag
B2: if (r1 != SET) goto L1
L2: Load r2 = data
; initially data = 0, flag != SET
; L1, B1 may repeat many times
; show R2 always be set to NEW?
- Load-store and store-load reordering: OoO cores may reorder loads and stores (different addresses) from the same thread. Reordering an earlier load with a later store (load-store reordering) can cause issues related to locks, etc. In store-load reordering below, r1 and r2 could equal store with store-load reordering.
Core C1
S1: x = NEW
L1: r1 = y
Core C2
S2: y = NEW
L2: r2 = x
- defines what behaviors a programmer can expect
- what optimizations system implementors can use
"The result of an execution is the same as if the opertions had been executed in the order specified by the program".
Memory order respects each core's program order.
R10000 core speculatively issues loads and stores in program order into an address queue. Loads and stores commit in program order and then remove their address queue entries. To commit a store, the L1 cache must hold the block in M state and the store's value must be written atomically with the commit.
The eviction of a cache block that contains a load's address in the AQ squashes te load of all subsequent instructions which then re-execute.
Relax store-load ordering.
Processor cores use store buffers to hold committed (retired) stores until the rest of the memory system can process the store. A store can enter the store buffer before the cache has obtained read-write coherence permissions.
A write buffer can be made architecturally invisible by ensuring that a load to address A returns the value of the most reset store to A, even if one or more stores are in the write buffer. This is accomplished using by bypassing the value of the most recent store....
A TSO execution requires:
-
All cores inser their loads and stores into the memory order respecting their program order, regardless whether they are to the same or different memory addresses.
-
Every load gets its value from teh last store before it to the same address.
-
Consider FENCE instructions.
The implementation of TSO is similar to SC with the addition of per-core FIFO write buffers
- Loads and stores leave each core in that core's program order.
- A load either bypasses a value from the write buffer or await the switch as before
- A store enters the tail of the FIFO write buffer or stalls if full
- When the "switch" selects, is performs either the next load or the store at the head of the write buffer.
Relax all orderings.
-
Non-FIFO, coalescing write buffer: coalescing (two non consequentive stores can write to the same entry in the buffer). Non-FIFO coalescing violates TSO before TSO requires stores to appear in program order.
-
Simpler support for speculation: Recall R10000 that has mechanism to replay falsely speculated mem ops. This is simplified.
-
Coupling consistency and coherence: May be possible for a core to read the old value of a store while order read the new --- temporary violation of SWMR!