Bike/threads.md Secret

## threads.md

      
    Raw
  

              threads.md
            
          
    NN. Concurrency
NN.1 Introduction to Concurrency
NN.2 Threads
NN.3 Locks
NN.4 Condition Variables
NN.5 Atomic Operations
NN.6 The Concurrency Dictionary

NN.1 Introduction to Concurrency
A non-concurrent program's behavior consists of a sequence of evaluations with an order unambiguously defined by the programmer. In a concurrent program, however, the program leaves some ordering constraints undefined. This allows implementations to reorder evaluations or even perform them simultaneously.
Concurrency in Lisp is expressed via "threads". Each thread evaluates a function, and ends when that function returns. Each thread proceeds as though its evaluations take place in the unambiguous order defined by the non-concurrent evaluation rules. Evaluations in two different threads are however not ordered, except as described in this chapter.
Some operators impose constraints on evaluation between threads. An evaluation A synchronizes-with an evaluation B in another thread if A happens before B. Evaluations of operators that impose these constraints are called synchronization operations. Synchronization operations include starting and joining threads, acquiring and releasing locks, and some atomic accesses. Without synchronization, side effects performed by an evaluation in a thread A may not be visible in other threads, or may appear only partially or as though they occurred in an unusual order.
Because evaluation in any particular thread only has to proceed as though it followed the non-concurrent evaluation rules, implementations have considerable leeway to rearrange evaluations in the actual execution of the thread. For example, an implementation is free to reorder side-effecting operations and operations depending on those side effects, so long as a dependent operation behaves as if any side effects it depends on occurred first. In a non-concurrent program these rearrangements are by definition invisible to the programmer; however, if other threads are executing simultaneously, they can observe side effects occurring in ways violating the non-concurrent evaluation semantics.
Writing to a place is a very common side effect, and so accesses to places in different threads that are not carefully ordered may have undefined consequences. Programmers should ensure that access to objects shared across threads is done carefully so that program behavior is well-defined.
NN.1.1 Definition of Concurrency
NN.1.2 Conflicts and Data Races
NN.1.3 Note on Parallelism

NN.1.1 Definition of Concurrency
An evaluation in one thread is sequenced-before another evaluation in the same thread if it must occur first according to the usual evaluation rules (see for a start 3.1 Evaluation). For example, evaluation of a form in progn is sequenced-before evaluation of subsequent forms in that progn. sequenced-before is an antisymmetric, transitive relation between evaluations: if A is sequenced-before B, B is not sequenced-before A, and if A is sequenced-before B which is in turn sequenced-before C, A is also sequenced-before C.
An evaluation in one thread may sometimes synchronize-with an evaluation in another thread. synchronize-with is also an antisymmetric, transitive relation. Unlike sequenced-before, the order of evaluations that synchronize-with one another is usually only partially defined by the programmer.
Synchronization operations are evaluations that can synchronize-with others; the operators that accomplish this individually document that fact. The most important synchronization operations are as follows:

Release operations (such as release-lock or atomic writes with strong enough order) may synchronize-with acquire operations (such as acquire-lock or atomic reads with strong enough order) referring to the same object or place.
make-thread synchronizes-with the first evaluation in the new thread.
The final evaluation in a thread synchronizes-with all join-thread calls on that thread, as well as any thread-alive-p calls on it that return false.

An evaluation A happens-before another evaluation B provided that either A is sequenced-before B, A synchronizes-with B, or there is some other operation C such that A happens-before C and C happens-before B. As such, it is also an antisymmetric and transitive relation.
The happens-before relation defines which evaluations occur before others in the intuitive sense. In a program that does not use concurrency, all evaluations must occur in one well defined order, as for any two evaluations, one is sequenced-before (and therefore happens-before) the other. In a program that does use concurrency, two evaluations may not have any necessary happens-before relation between them. A particular execution of the program may involve a particular happens-before order - e.g. locks may be acquired in some order or another - but this order is not fully defined by the program and may vary between executions.
If an evaluation A does not necessarily happen before another evaluation B, the evaluations are unordered. The side effects (e.g. writes to places) performed by A may not be visible to B, or may appear partially or in an unusual form. It is not necessarily the case that side effects that are sequenced in some order in some thread appear to have taken place in that order in another thread, or appear to have taken place at all. If this is not desired, synchronization operators should be inserted by the programmer to ensure a happens-before order is in place.
If two evaluations are not ordered by the happens-before relation, they are unordered. If the two evaluations operate on the same object or place, there may be a conflict: See NN.1.2.
NN.1.1.1 Sequential Consistency
NN.1.1.2 Acquisition and Release Operations

NN.1.1.1 Sequential Consistency
A set of evaluations is sequentially consistent iff there is a total order on them, and this order is consistent with the program's happens-before order. In other words, for any two sequentially consistent operations that could possibly conflict, one happens-before the other. This means that any sequentially consistent evaluation observes the side effects of all sequentially consistent evaluations that happen-before it. If evaluations across threads are sequentially consistent, the execution of the program consists of an interleaving of these evaluations in some order that is undefined other than being consistent with the sequenced-before order.
Sequential consistency is a simple and intuitive notion of concurrency, but some concurrent programs are not sequentially consistent. Some compiler optimizations, as well as runtime optimizations like per thread caches, defeat sequential consistency, and so to guarantee sequential consistency requested by the programmer, an implementation may have to disable certain optimizations. As such, it is important that programmers be aware of the more complex behaviors that they may have to deal with.

NN.1.1.2 Acquisition and Release Operations
"Acquire" (or "acquisition") and "release" synchronization operations relate to some value or place. Acquire operations are reads, and release operations are writes. If an acquire operation reads the value written by some release operation, that release operation may synchronize with that acquire operation, and so anything that happens before the release happens before the acquisition as well. (A more formal definition is given in NN.5.1.3.)
Acquisitions and releases impose constraints on evaluation order that can be used to implement mutual exclusion, among other things. Consider the following example. Say lock is a place that has value 1 initially, and data has value 0 initially. Now say there are two threads executing; one evaluates (setf data 347 lock 0), and another evaluates (progn (loop while (plusp lock)) data). If lock is merely atomic, there is no guarantee that the (setf data 347) happens-before the loop exiting, and so the second form may return 0, even though (setf data 347) is sequenced-before the (setf lock 0) that produced the value read by the loop. But if lock is released by the first thread and acquired by the second, (setf lock 0) must happen before the second thread's evaluation of lock that returned 0, and as such the (setf data 347) happens before the end of the loop as well; so the second thread returns 347, even if data is not atomic at all.
While acquire and release operations do synchronize, there is not necessarily a total order of acquisition and release operations on distinct objects or places. Sequentially consistent operations are acquire and/or release operations that additionally impose an order of this kind.

NN.1.2 Conflicts and Data Races
For the purposes of this section, a "write" of a place is the evaluation of its setf expansion's storing form, and a "read" is the evaluation of its accessing form. Evaluation of subforms can be analyzed separately. Binding a variable (or other place, if this is supported) is considered to write its initial value.
For the purposes of this section, places can be simple or compound. A simple place is one of the following:

a lexical or special variable.
a function call form for one of the functions: aref, bit, car, cdr, char, class-name, compiler-macro-function, documentation, fdefinition, fill-pointer, find-class, first, gethash, logical-pathname-translations, macro-function, readtable-case, rest, row-major-aref, sbit, schar, slot-value, svref, symbol-function, symbol-plist, symbol-value.
a function call form for one of the functions: standard-instance-access, funcallable-standard-instance-access, slot-value-using-class.
a function call form where the function is a defstruct slot accessor.
an APPLY place where the function is one of the above functions.

Note that a place being simple does not necessarily imply that it can be accessed atomically[2].
If two evaluations access the same place and at least one of the evaluations writes to the place, the evaluations are said to conflict. Conflicting accesses to a simple place form a data race unless either one happens-before the other, or both accesses are atomic[2]. If a program has a data race, its behavior is undefined.
Note that some (but not all) simple places are always accessed atomically: see NN.5.4.
A compound place is any place that is not simple. Compound places are those that can be analyzed as being composed of multiple accesses of simple places by looking at their setf expansions. Conflicting accesses to compound places have defined behavior provided there are no conflicting accesses to the underlying simple places: they behave as though they were made up of the underlying simple accesses.
Compound places can be analyzed as follows:

caar etc., second etc., and nth may read the appropriate cdrs of the list, and write the one car.
elt on a cons works as an nth place, and on a vector as an aref place (i.e. an elt write performs one simple write).
get may read the property list and any number of cars or cdrs of that list, and write either the property list or a car of the list.
subseq writes the relevant sequence positions as by elt.
ldb, mask-field, and getf operate as decribed in 5.1.2.2.
User-defined function call places perform a call to the appropriate setf function.
For other places, the equivalences in 5.1.2 and 5.1.1.2 apply.

Some operations may cause conflicts (and therefore possibly data races) despite not operating on places directly. Not all of these cases are enumerated here, because by the definitions of those operations it should be clear what writes and reads they perform. For example, rplaca writes the car of a cons, search reads the elements of its sequence arguments, and a loop for-as-across can read the array's dimensions and its elements.
Some special cases are as follows:

adjust-array on an adjustable array is considered a write on all places associated with that array, such as its elements and fill pointer. Such calls also conflict with any evaluation that accesses any properties of that array, including but not limited to array-dimension, array-rank, other adjust-array calls on that array, class-of, and generic function dispatch.
change-class on an instance, and make-instances-obsolete on an instance's class, conflict with all evaluations that access that instance, including but not limited to class-of, slot accesses, other change-class calls on that instance, and generic function dispatch.
remhash can be considered a write that conflicts with reads of gethash with the table and an equivalent key, and clrhash conflicts with all reads of hash table entries.
set-funcallable-instance-function on an instance conflicts with calling it.
slot-makunbound can be considered a write that conflicts with reads of the relevant slot. Writes of a slot conflict with slot-boundp of the slot. Similar considerations apply to the MOP "-using-class" versions.

Implementations are free to define some or all of the accesses above as atomic, removing the possibility of data races. Implementations may also define certain patterns that would be considered conflicts by this standard to not conflict; for example, an implementation could define that writes to ldb places with the same underlying place will not conflict as long as the byte specifiers do not overlap.
Implementations may define some simple places as compound, in which case they must document which simple places are accessed. They may not define places listed here as compound as simple.
Data races can be avoided in concurrent programs by the use for example of locks (NN.3) or atomic operations (NN.5).
NN.1.2.1 Object Creation and Initialization Conflicts
NN.1.2.2 Interference Conflicts
NN.1.2.3 Examples of Conflicts

NN.1.2.1 Object Initialization Conflicts
For the purpose of conflict analysis, initialization of objects can be treated as writes of all the object's places. This reflects the fact that a thread may allocate an object and make it available to other threads before completely initializing it.
In an inadequately synchronized program, an object's creation may not necessarily happen before a read of a place of that object. In this scenario, the behavior of the read is not defined. Programmers should endeavor to avoid this scenario.

NN.1.2.2 Interference Conflicts
Accesses to some places may conflict with accesses to distinct places. In particular, a write to any position in an array, or any slot of a structure-object, or any slot of a standard-object with allocation :instance, may conflict with accesses to any other position of the array or any other slot of the object, unless T is a recognizable subtype of the upgraded array element type of the array  or of the slot's declared type.
This provision allows implementations to store objects in compact formats not amenable to atomic access. Implementations are encouraged to document which accesses to places of this sort may interfere with one another.

NN.1.2.3 Examples of Conflicts
(let ((x (make-string 1)))
  (make-thread (lambda () (setf (char x 0) #\1)))
  (make-thread (lambda () (setf (char x 0) #\2))))
;; => undefined behavior: char may not be atomic,
;; and the evaluations in the threads are unordered.

(let ((x #\h) (y (make-string 1)) (z (make-string 1)))
  (mapc #'join-thread
        (list
         (make-thread (lambda () (setf (char y 0) x)))
         (make-thread (lambda () (setf (char z 0) x)))))
  (values y z))
;; => #\h, #\h
;; The accesses of x are both reads, so they don't conflict.
;; The binding of x to #\h is sequenced-before both make-thread calls.
;; Each make-thread call synchronizes-with the start of the given thread.
;; As such, the binding of x to #\h happens-before both reads, so the
;; value of x is well-defined.

;;; Behavior of compound place conflicts
(let (x y)
  (mapc #'join-thread
        (list
         (make-thread (lambda () (setf (values x y)
                                       (values 'a 'b))))
         (make-thread (lambda () (setf (values x y)
                                       (values 'c 'd))))))
  (values x y))
;; The writes to (values x y) conflict, and that place is not and
;; probably could never be atomic. However, the values places are
;; compound, and writes to them consist of writes to X and Y.
;; This means that, for the purpose of conflict analysis, we may
;; consider (setf (values x y) form) to be equivalent to
;; (multiple-value-bind (g1 g2) form (setf x g1 y g2)).
;; Note that we use setf rather than psetf because per 5.1.2.3,
;; the storing forms of a values place are evaluated left to right.
;; Accesses of X and Y must be at least unordered per NN.5.4.
;; Both threads' final evaluations synchronize-with join-threads
;; that are turn sequenced-before the final value read;
;; the initial bindings of x and y are sequenced-before the
;; make-thread calls, which synchronize-with the threads' first
;; evaluations.
;; In short, the final reads of X and Y can read either of the
;; values written by the threads, but not the initial bindings, so
;; => (A B), or (A D), or (C B), or (C D)

;;; Behavior of initialization conflicts
(let ((x (list 1)))
  (join-thread (make-thread (lambda () (car x)))))
;; => 1, as the list is created before make-thread is called, and
;; make-thread synchronizes with the read.

(let* (x
       (thread (make-thread (lambda () (loop until x) (car x)))))
  (setf x (list 3))
  ...)
;; => undefined
;; The loop's read of X is atomic and at least unordered per NN.5.4,
;; so it may only read NIL or the list (or something else written to X by ...).
;; However, there is no synchronization between the call to LIST and the read of
;; its CAR, and so the (CAR X) has undefined behavior.

(let* (x
       (thread (make-thread (lambda ()
                              (loop until (atomic x :order :acquire))
                              (car x)))))
  (setf (atomic x :order :release) (list 3))
  (join-thread thread))
;; => 3
;; The write to x synchronizes-with the loop's read of x that returns non-NIL.
;; As such, (list 3) happens-before (car x), and so the read's behavior is
;; fully defined.

;;; Interference conflicts
(let* ((bv (make-array 5 :element-type 'bit :initial-element 0))
       (thread (make-thread (lambda () (setf (sbit bv 0) 1)))))
  (setf (sbit bv 1) 1)
  (join-thread thread)
  bv)
;; => undefined; (sbit bv 0) and (sbit bv 1) are distinct places, but are
;; allowed to interfere with one another, so this is a conflict.

NN.1.3 Note on Parallelism
This standard does not require that unordered evaluations must proceed in parallel. An implementation for which all evaluations are sequentially consistent (e.g. via multitasking) would be just as conforming as one that reorders and parallelizes with abandon.
An implementation could technically conform while doing no concurrency at all - e.g. making make-thread a simple funcall. In this circumstance it is recommended that the implementation document this fact and perhaps issue style-warnings, as programmers may expect evaluations to proceed in parallel.

NN.2 Threads
[As in Bordeaux Threads, with a few additions and changes:]
In any thread, it is undefined whether catch tags or other exit points from other threads are valid, or whether active handlers or restarts are available. In all cases, transferring control out of a thread has undefined consequences.
make-thread synchronizes with all evaluations in the created thread. All evaluations in the thread synchronize with any join-thread call on that thread returning. The beginning of the first evaluation in the thread synchronizes with any thread-alive-p calls on that thread that return true; the thread returning synchronizes with any thread-alive-p calls on that thread that return false. interrupt-thread calls synchronize with the thread calling the interruption thunk.
[Possibly add return-from-thread and abort-thread. Possibly add the abort-thread restart widely defined by implementations.]

NN.3 Locks
[As in Bordeaux Threads, plus:]
A release-lock constitutes a release operation; it synchronizes with any subsequent successful acquire-lock calls.
[Maybe add a function to get the current owner of a lock.]

NN.4 Condition Variables
[As in Bordeaux Threads. Synchronization behavior of condition variables should be well defined by how they are defined to acquire and release locks. Maybe add a wait with timeout.]

NN.5 Atomic Operations
An evaluation that conflicts with another (NN.1.2) evaluation with which it has no happens-before ordering may still not cause a data race and have defined behavior, provided that both operations are atomic[2].
What behavior to expect depends on the atomic ordering of the atomic operations (NN.5.1). Atomic access can be guaranteed by using the atomic accessor macro, as well as other operators such as cas, and some operations are implicitly atomic (NN.5.4). Implementations are permitted to define other accesses as atomic.
NN.5.1 Atomic Ordering
NN.5.2 Atomic Read-Modify-Write Operations
NN.5.3 The ATOMIC Macro and Atomic Expansions
NN.5.4 Implicitly Atomic Operations
NN.5.5 Fences
NN.5.6 Implementation-defined Atomic Operations

NN.5.1 Atomic Ordering
The atomic ordering of an atomic operation expresses its behavior when there is a conflict with another atomic operation. This standard defines several atomic orderings, and implementations are permitted to extend the system with further orderings.
Some orderings are "stronger" than others, in that they provide all the same guarantees and more. In this sense, relaxed is stronger than unordered, acquire and release are stronger than relaxed, acquire-release is at least as strong as acquire and release, and sequential consistency is stronger than acquire-release.
If two atomic accesses to the same place conflict, and the accesses have different orderings, only the requirements of the weaker ordering need be respected.
Because orderings are minimal requirements, it is permissible for an implementation to have stronger guarantees than defined. For example, an implementation that makes all atomic accesses sequentially consistent would be conforming.
Orderings are defined in terms of access. If a non-access operation is defined as having some ordering, it means that the accesses underlying that operation have that order.
By definition, each read operation with defined behavior executed by a program has an associated write operation, such that the value written by the write is read by the read. This write is referred to as the read's associated write. If there is a conflict between a read and two or more writes, the read's associated write may not be fully constrained; but as long as all operations are involved are atomic, the read's behavior is still defined, and the read is associated with one of the conflicting writes. Formalization of these concept is important for understanding sub-sequentially-consistent operations.
NN.5.1.1 Unordered Ordering
NN.5.1.2 Relaxed Ordering
NN.5.1.3 Acquire and Release Orderings
NN.5.1.4 Acquire-Release Ordering
NN.5.1.5 Sequentially Consistent Ordering

NN.5.1.1 Unordered Ordering
If two atomic accesses conflict and they both have at least the "unordered" ordering, and if one is a read, that read's associated write will be consistent with the happens-before ordering.
That is, the associated write W of a read R satisfies at least the following conditions:

R does not happen-before W.
There is no write W2 to the place such that W happens-before W2 and W2 happens-before R.

Unordered ordering is the minimum ordering defined by this standard. It is suitable as a safety guarantee, but not much else.  Unordered accesses are not necessarily coherently ordered; for example, the above definition does not exclude a thread's unordered reads of a place seeing another thread's writes in the reverse of that thread's sequenced-before ordering of them, as long as those writes are unordered with respect to the reads.
Note that an unordered read-modify-write is indistinguishable from an unordered read followed by an unordered write, and is therefore not very useful for concurrency. Implementations may choose to issue a style warning if a read-modify-write operation is listed as being unordered.

NN.5.1.2 Relaxed Ordering
"Relaxed" or stronger writes to the same place can be given a total order that is consistent across threads, called the place's write order. Given that all writes and reads involved are at least relaxed, this provides the following guarantees:

Write-write coherence: If a write W1 happens-before another write W2, W1 appears before W2 in the write order.
Read-read coherence: If a read R1 is associated with a write W, and R1 happens-before another read R2, R2 is associated with either W, or some other write that is later in the write order.
Read-write coherence: If a read R happens-before a write W, R is associated with a write prior to W in the write order.
Write-read coherence: If a write W happens-before a read R, R is associated with either W, or some other write that is later in the write order.

If more weakly ordered writes to the place conflict with a read, the read may be associated with either one of those writes, or a relaxed-or-stronger write meeting the above conditions.
Note that the write orders for different places do not have to be consistent with one another in relation to the happens-before ordering, and in general will not be.
Relaxed ordering is intended to provide a basic coherence guarantee. It is useful for applications such as atomic counters, in which side effects other than counter increments are irrelevant to the incrementing threads.

NN.5.1.3 Acquire and Release Orderings
Only reads may be given the acquire order, and only writes may be given the release order. A read specified as a release, or a write specified as an acquisition, are in fact only required to have relaxed order.
Acquire and release operations are synchronization operations. If an acquire operation's associated write is a release operation, that release operation synchronizes-with that acquire operation.
This models the behavior of locks (mutual exclusion) as described in NN.3. For example, consider two threads:
(with-lock-held (lock) (push "a" some-place)) ; thread A
(with-lock-held (lock) (push "b" some-place)) ; thread B
Note that some-place is not necessarily accessed atomically.
Each thread has one acquire operation - acquiring the lock - and one release operation - releasing the lock. Each thread's acquire operation is sequenced-before its release operation. Now, each thread's successful acquire operation either acquires the initial unlocked state, or the state after the lock is released by the other thread; it is not possible for both threads to acquire the lock simultaneously. The initial creation of the lock synchronizes-with one thread's lock acquisition, and that thread's lock release synchronizes-with the other thread's lock acquisition. As such, when the second thread acquires the lock, the first thread's push to some-place has already completed, i.e. happens-before the acquisition. Therefore, there are no conflicts concerning some-place, so atomicity is not necessary for accessing some-place.

NN.5.1.4 Acquire-Release Ordering
A read with the acquire-release ordering has acquire ordering, and a write with the acquire-release ordering has release ordering. The read portion of a read-modify-write operation has acquire ordering, and the write has release ordering.

NN.5.1.5 Sequentially Consistent Ordering
Sequentially consistent evaluations are sequentially consistent in the sense of NN.1.1.1: enough synchronizes-with edges between them exist so that any two sequentially consistent evaluations are related by happens-before.

NN.5.2 Atomic Read-Modify-Write Operations
Some operators, such as cas, perform atomic read-modify-write operations. An atomic read-modify-write operation is an atomic read immediately followed by an atomic write. No other operations on that place can take place in the interval between the read and the write; more formally, in the sense of NN.5.1.2, whatever write is associated with the read of the atomic read-modify-write operation is immediately followed in the write order by the atomic read-modify-write operation's write.

NN.5.3 The ATOMIC Macro and Atomic Expansions
Atomic operations may be explicitly written out by the programmer using the atomic macro. An atomic macro form can be used to access an underlying place atomically; reads from an atomic place read the underlying place atomically, and writes to an atomic place write the underlying place atomically.
Atomic access is mediated by "atomic expansions". An atomic expansion is an ordered set of seven values analogous to setf expansions:
List of temporary variables
a list of symbols naming temporary variables to be bound sequentially, as if by let*, to values resulting from the value forms.
List of value forms
a list of forms (typically, subforms of the _place) which when evaluated yield the values to which the corresponding temporary variables should be bound.
Comparison variable
a symbol which is to hold the value expected by a compare-and-swap operation.
Store variable
a symbol which is to hold the new value that may be assigned to the place.
Reading form
a form which can reference the temporary variables, which atomically reads and returns the value of the _place.
Storing form
a form which can reference the temporary variables and the store variable, which atomically writes the place, and then returns the value of the store variable.
Compare-and-swap form
a form which can reference the temporary variables, the comparison variable, and the store variable, which executes an atomic compare and swap operation; see cas. The comparison is as if by eq.
The programmer may define how to atomically access additional places by using the define-atomic-expander macro.

NN.5.4 Implicitly Atomic Operations
Accesses to the following places are atomic[2] even if the atomic macro is not used, with at least unordered atomic ordering:

lexical and special variables.
Function call forms for the following functions:

car, cdr, first, rest, svref, symbol-value
standard-instance-access, funcallable-standard-instance-access
slot-value and slot-value-using-class, provided T is a recognizable subtype of the slot's type.
defstruct slot accessors, provided T is a recognizable subtype of the slot's type.


NN.5.5 Fences
The macro fence can be used to provide additional synchronization operations that may not be optimally written in terms of reads and writes.
A release fence synchronizes-with acquire operations in other threads, provided those operations read something written by an atomic write that the fence is sequenced-before. This applies regardless of the order of the write.
Release operations may synchronize-with an acquire fence in another thread, provided an atomic read of a place is associated with the release operation, and this read is sequenced-before the fence. This applies regardless of the order of the read.
A release fence may synchronize-with an acquire fence in another thread, provided there is an atomic write of a place that the release fence is sequenced-before, that an atomic read of the place is sequenced-before the acquire fence, and the read is associated with the write. This applies regardless of the orders of the write and read.
An acquire-release fence is both an acquire fence and a release fence, so it can participate in any of the above. A sequentially consistent fence is additionally present in the sequential consistency order.
A less formal explanation of fences is that a release fence ensures that all writes sequenced-before the fence are completed before proceeding, and an acquire fence ensures that no reads the fence is sequenced-before actually occur until the fence. Implementations may not reorder writes forward across a release fence, or reads backward across an acquire fence, or at least not in any way observable by other threads.
As such, fences can be used to provide synchronization to otherwise unsynchronized atomic operations: for example to ensure that a sequence of writes is observed completely or not at all.
Note that atomic operations must still be involved in order to provide synchronization. If a release fence isn't followed by atomic writes, or an acquire fence isn't preceded by atomic reads, no synchronization is performed and undesired effects may result.
NN.5.5.1 Examples of Fencing

NN.5.5.1 Examples of Fencing
;;; This example demonstrates how fencing can be used to force synchronization
;;; even when no synchronized atomic reads or writes are performed.
;;; FLAG is some place accessible only to the threads that is initially NIL.
;; Thread A
(fill some-vector item) ; not atomic
(fence :release)
(setf (atomic flag :order :unordered) t)
;; Thread B
(let ((f (atomic flag :order :unordered)))
  (fence :acquire)
  (when f
    ;; Fence-fence synchronization conditions have been met: We've atomically
    ;; read a value associated to Thread A's write, and so some-vector must be
    ;; fully filled at this point, despite that FILL and REMOVE likely can not
    ;; be atomic.
    (map nil some-function some-vector)))


NN.5.6 Implementation-defined Atomic Operations
Implementations are free to define evaluations as atomic[2] that are not defined as such in this standard, and to define additional synchronization operations. They are strongly encouraged to use the atomic ordering terminology to do so, in order to ensure consistency between implementation definitions and the standard, and between definitions across implementations.
In particular, for safety reasons it is encouraged to make as many accesses as possible have at least the unordered ordering, even without the atomic macro being used.

Accessor ATOMIC
Syntax:
atomic place &key order &allow-other-keys => value
(setf (atomic place &key order &allow-other-keys) new-value)
Arguments and Values:
place---a place.
order---an atomic ordering designator. Default is :sequentially-consistent.
Description:
atomic is used to indicate that access to a place must be performed such that its effects and results are defined even in the presence of conflicting accesses. Various requirements are imposed depending on the value of order: See NN.5.1.
Possibilities for order are :unordered, :relaxed, :acquire, :release, :acquire-release, :sequentially-consistent. The default is :sequentially-consistent. Other than :not-atomic, these correspond to the atomic orderings described in NN.5.1.
Any other keyword arguments are passed, unevaluated, to get-atomic-expansion.
The atomicity applies only to the single access, and not to evaluation of the subforms.
atomic of a the place is defined such that (atomic (the spec place) order) = (the spec (atomic place order)). atomic of a macro place is defined such that (atomic macro-form order) = (atomic expanded-form order).
Implementations are required to support atomic with at least the following places and ordering :sequentially-consistent: lexical variables (including closed-over variables), special variables, car, cdr, first, rest, svref, symbol-value, standard-instance-access, funcallable-standard-instance-access, structure slots with a type that T is recognizably a subtype of, slot-value given that T is a recognizable subtype of the slot's type.
Exceptional Situations:
If the requirements cannot be met, an error of type not-atomic is signaled.
Notes:
atomic could be implemented in terms of get-atomic-expansion as follows:
(defmacro atomic (place &rest keys &key order &allow-other-keys
                  &environment env)
  (declare (ignore order))
  (multiple-value-bind (vars vals stores write read)
      (apply #'get-atomic-expansion place :environment env keys)
    (declare (ignore stores write))
    `(let* (,@(mapcar #'list vars vals)) ,read)))

(define-setf-expander atomic (place
                              &rest keys &key order &allow-other-keys
                              &environment env)
  (declare (ignore order))
  (apply #'get-atomic-expansion place :environment env keys))
Rationale:
This syntax provides a natural way to use atomic operations as the programmer would normal places, e.g. with setf and derived operators. Other keys are allowed because atomic operations are still kind of an open research area, and they might be useful for future extension; and anyway a normal macroexpander function like you might use for setf expansions isn't enough since the order needs to be passed in, so why not allow a little generalization?
The order (and other parameters) are not evaluated (as they are in for example C++) because they are essentially instructions to the compiler: telling it where to insert fences, to inhibit certain optimizations, etc. A runtime order would essentially have to be implemented as a runtime switch on the order, to an atomic operation with each order. This is fairly useless to the developer as the switch might overwhelm any performance advantage of relaxed ordering or whatever, and the compiler has to assume the most restrictive requirements in general anyway, since the order might end up as :sequentially-consistent. A developer who really wants a runtime choice for whatever reason can write it out themselves without issue.
It might be possible in some cases to define "automatic" atomic places using locks. For example C++ allows atomic access to any memory location regardless of size, and this seems to be commonly implemented by maintaining a couple global locks to use. This obviously removes the lock-freedom that helps make atomics appealing, and doesn't really make sense for Lisp's arbitrary places anyway, so I don't think it's worth adding as a requirement.
If atomicity cannot be guaranteed, the implementation must signal an error. This is because if a programmer write in atomics, their program will almost certainly be wrong if they're not used. Non-atomicity can be detected at compile-time so this shouldn't present a performance issue.

Function GET-ATOMIC-EXPANSION
Syntax:
get-atomic-expansion place &key environment order &allow-other-keys
=> vars, vals, store-vars, writer-form, reader-form
Arguments and Values:
place---a place.
environment---an environment object.
vars, vals, store-vars, writer-form, reader_form---a setf expansion.
Description:
Returns the atomic expansion for (atomic place ...) in environment. Keyword arguments passed to the atomic place are passed on without processing.
Exceptional Situations:
If no atomic expansion for the atomic place exists, an error of type not-atomic is signaled.

Macro DEFINE-ATOMIC-EXPANDER
Syntax:
define-atomic-expander operator place-lambda-list expander-lambda-list [[declaration* | documentation]] form*
Arguments and Values:
operator---a symbol that names a function or macro.
place-lambda-list---a macro lambda list.
expander-lambda-list---an ordinary lambda list.
declaration---a declare expression; not evaluated.
documentation---a string; not evaluated.
forms---an implicit progn.
Description:
define-atomic-expander specifies the means by which atomic can be used to access a place referenced by operator atomically. When atomic is given a place that is specified in terms of operator, it is expanded into forms that perform the appropriate operations.
The place lambda list supports destructuring. See Section 3.4.4 (Macro Lambda Lists).
Documentation is attached to operator as a documentation string of kind atomic.
Forms constitute the body of the atomic expander definition and must compute the atomic expansion for an atomic form that references the place by means of the given operator. The atomic expander function is defined in the same lexical environment in which the define-atomic-expander form appears. While forms are being executed, the variables in place-lambda-list are bound to parts of the place form, and the variables in expander-lambda-list are bound to the parameters passed to get-atomic-expansion and/or atomic. The body forms (but not the lambda lists) in a define-atomic-expander form are implicitly enclosed in a block whose name is operator.
The evaluation of forms must result in the seven values described in Section NN.5.3 (The ATOMIC Macro and Atomic Expansions). Note that define-atomic-expander does not itself guarantee the correctness of the expansion, i.e. that the atomic expansion actually describes how to complete an atomic access; that is the responsibility of the programmer.
If a define-atomic-expander form appears as a top level form, the compiler must make the atomic expander available so that it may be used to expand operations on atomic places later on in the file. Programmers must ensure that the forms can be evaluated at compile time if the operator is used in an atomic place later on in the same file. The compiler must make these atomic expanders available to compile-time calls to get-atomic-expansion when its environment argument is a value received as the environment parameter of a macro.
Rationale:
It's useful to define new places with define-setf-expander, and by the same logic it may be useful to define new atomic places as well.
Having two lambda lists is definitely a little weird, but I think it works out okay.

Function IS-ATOMIC-P
Syntax:
is-atomic-p place &key atomic-order environment &allow-other-keys => boolean
Arguments and Values:
place---a place.
environment---an environment.
atomic-order---an atomic ordering designator. Default is :sequentially-consistent.
boolean---a boolean.
Description:
Returns true iff the place can be accessed atomically.
Notes:
is-atomic-p could be implemented in terms of get-atomic-expansion as follows:
(defun is-atomic-p (place
                    &rest keys &key atomic-order environment &allow-other-keys)
  (handler-case
      (progn (apply #'get-atomic-expansion place keys) t)
    (not-atomic () nil)))

Macro CAS, CAS-EXPLICIT
Syntax:
cas place old new &key weak test test-not key => success, present
cas-explicit (place &key order &allow-other-keys) old new &key weak test test-not key => success, present
Arguments and Values:
place---a place.
old, new---objects.
weak---a boolean. The default is false.
order---an atomic ordering designator. Default is :sequentially-consistent.
test---Evaluated. A designator for a function of two arguments that returns a generalized boolean. The default is eql.
test-not---Evaluated. A designator for a function of two arguments that returns a generalized boolean.
key---Evaluated. A designator for a function of one argument, or nil.
success---a boolean.
present--- an object.
Description:
Performs an atomic compare-and-swap of place. Atomically, the place is read, and whether it satisfies the test with old is checked. If it does, new is written in to the place, and true and old are returned; if it doesn't, false and the value read are returned.
If weak is true (default false), the read-modify-write may "fail", i.e. return a false primary value, even if old does match. This is sometimes more efficient in loops, as if weak is false the implementation may have to do more work to distinguish failures due to concurrent operations from failures due to e.g. operating system context switches.
For cas, default values of the atomic place parameters are used, e.g. the access has sequentially consistent ordering. For cas-explicit, the given parameters are used.
Exceptional Situations:
If the place cannot be accessed atomically, an error of type not-atomic is signaled. If the place is an atomic place, an error is signaled.
Notes:
(cas place ...) = (cas-explicit (place) ...)
cas with a complex test could be implemented in terms of a cas with an eq test with something like the following:
(loop with ,compare = ,read-form
      if (satisfies-the-test ,compare ,new)
        do (multiple-value-bind (successp read) ,cas-eq-form
             (if successp
                 (return (values successp read))
                 (setf ,compare read)))
      else (return (values nil ,compare)))

where the interpolated forms are as returned by get-atomic-expansion, provided eq can work reasonably on numbers and characters in the implementation.
Rationale:
This seems to be the usual general RMW primitive now, though LL/SC exists as well, and CAS doesn't include machine RMW arithmetic atomics. I think it would be possible for an implementation to recognize that an atomic expansion corresponds to one of its own, and in that case use whatever machine operations for certain arithmetic that would otherwise be mediated by CAS.
TEST, TEST-NOT, and KEY are pretty weird for CAS, but pretty normal for Lisp, and that wins. A default TEST of EQL admittedly makes things harder for implementations to optimize, but it's weird to express arithmetic in terms of EQ given EQ is allowed to be ambiguous on numbers, and quite a lot of CAS usage will be for arithmetic. (Really, I expect/hope most users would want to use the higher level macros below instead of CAS directly.)
The -explicit macros here and below are admittedly a little awkward, but it's the best I could come up with to separate atomicity parameters from parameters of the operation itself. This is most relevant for atomic-update-explicit which would be hard to unambiguously specify parameters for otherwise. An alternate way would be to allow place to be an atomic place and in that case use its parameters, but that would require some weird extra processing by the implementation to recognize atomic places correctly, e.g. through macroexpansions and such. I expect most users won't care about the ordering, so the longer -explicit forms will be sight out of mind for them.

Macro ATOMIC-UPDATE, ATOMIC-UPDATE-EXPLICIT
atomic-update (place &key order &allow-other-keys) update-fn &rest arguments => new-value
atomic-update-explicit place update-fn &rest arguments => success, present
Arguments and Values:
place---a place with an atomic expansion.
order---an atomic ordering specifier. Default is :sequentially-consistent.
update-fn---a designator for a function that accepts as many arguments as are provided, plus one.
argument---an object.
new-value---an object.
Description:
Performs an arbitrary atomic read-modify-write operation on place. First, the subforms of place, the update-fn, and the arguments are evaluated, in that order. Second, the value of place is read atomically, with the given parameters in atomic-update-explicit. Third, update-fn is called as by (apply update-fn placeval arguments), placeval being the value read from the place. Finally, the primary value of this call is written into the place, and returned from the form.
It is permissible for an implementation to read from the place multiple times, and to call update-fn multiple times, e.g. in a compare-and-swap loop; however place's subforms, update-fn, and the arguments may be evaluated at most once.
Rationale:
This is provided by SBCL and I think it's a pretty good idea - you can implement most higher level atomic RMW operations in terms of it, though maybe not optimally. For example (atomic-incf place delta) = (atomic-update place #'+ delta).
SBCL however uses a different argument order and evaluates the update function and arguments each time around the CAS loop. I don't see why. If they're evaluated at most once, and the update function has no side effects, you can treat it as an immediately successful atomic RMW operation, and looping is an implementation detail. Doing it this way also matches the CLHS 5.1.1.1 rules for standard read-modify-write macros better, I think.

Macro ATOMIC-INCF, ATOMIC-INCF-EXPLICIT
atomic-incf place &optional delta => new
atomic-incf-explicit (place &key order &allow-other-keys) &optional delta => new
Arguments and Values:
place---a place, the value of which is a number, which has an atomic expansion.
delta---Evaluated. A number. Default is 1.
order---An atomic ordering specifier. Default is :sequentially-consistent.
new---The value written by the operation.
Description:
As incf, but as an atomic read-modify-write-operation.
The place may be read multiple times, but its subforms are evaluated only once, as is delta. The order of evaluations is as specified in 5.1.1.1.
Notes:
(atomic-incf place ...) = (atomic-incf-explicit (place) ...)
(atomic-incf-explicit (place ...) delta) = (atomic-update-explicit (place ...) #'+ delta)

Macro ATOMIC-DECF, ATOMIC-DECF-EXPLICIT

Macro ATOMIC-PUSH, ATOMIC-PUSH-EXPLICIT

Macro ATOMIC-PUSHNEW, ATOMIC-PUSHNEW-EXPLICIT
atomic-pushnew item place &key test test-not key => new
atomic-pushnew-explicit item (place &key order &allow-other-keys) &key test test-not key => new
Arguments and Values:
item---Evaluated. An object.
place---A place, the value of which is a proper list, which has an atomic expansion.
order---An atomic ordering specifier. Default is :sequentially-consistent.
...
Description:
As pushnew, but as an atomic read-modify-write operation.
The place may be read multiple times, but its subforms are evaluated only once, as are the other evaluated forms. The order of evaluations is as specified in 5.1.1.1. The test, test-not, and key functions may be called any number of times.
As with pushnew, it is implementation-dependent whether or not atomic-pushnew actually performs a write to the place in the situation where the item is already a member of the list held by place.

Macro FENCE
fence order => unspecified
Arguments and Values:
order---an atomic ordering specifier. Default is :sequentially-consistent.
Description:
The fence macro acts as a synchronization fence of the given order, as described in NN.5.5.
Exceptional Situations:
If order is :relaxed or weaker, an error is signaled.
Rationale:
It's not a function for the same reason atomic doesn't evaluate its order specifiers. Arguably fence could be a special operator, but it seems like a pretty specific thing to add one for.