Skip to content

Instantly share code, notes, and snippets.

@no-defun-allowed
Last active June 5, 2023 05:17
Show Gist options
  • Save no-defun-allowed/f79ea6e211b5f078e349c53937b5be74 to your computer and use it in GitHub Desktop.
Save no-defun-allowed/f79ea6e211b5f078e349c53937b5be74 to your computer and use it in GitHub Desktop.
Partial read barriers and Common Lisp

The "partial read barrier" and Common Lisp

Most CLOS implementations use a representation for instances which uses an indirection, in order to implement class redefinition and change-class. In this document I propose a technique which avoids the indirection most of the time, describe how to optimise around the assumption that a class does not change, and how this technique fares with multiple threads.

The partial read barrier and Smalltalk

Smalltalk also allows for class redefinition, and has a become: method which swaps objects in memory (with similar power to change-class in Common Lisp). Thus, Smalltalk implementations either required an indirection for all objects (often through the "object table"), or the implementations required the forementioned operations to scan the whole heap to rewrite references to objects affected by the operations. We'll focus solely on how to implement class redefinition in most of this document; the only difference between class redefinition and change-class is the cost model.

Miranda and Béra implemented a partial read barrier for Smalltalk. In their scheme, an object may be forwarded, requiring accesses of the objects to follow a forwarding pointer to access the instance variables of the object. The forwarding pointer points to the target of a forwarded object.

Accessing the instance variables ("slots" in Common Lisp terminology) does not require following a forwarding pointer, if an object has not had its class redefined, and the object has not been involved in a become: operation. Forwarding is only required after one of the two forementioned events occurs to an instance, and the object is marked as forwarded when such events occur.

(There is somewhat a jump in terminology here, from "indirection" to "read barrier". But it is justified; the conventional "indirect" representation requires something like Brooks's read barrier, which always follows a forwarding pointer, before an instance can be used. The Miranda-Béra partial read barrier resembles Baker's read barrier, which tests if the object has been forwarded, before following a forwarding pointer. So I'll call the actions that must be done before accessing the slots of an instance a "read barrier".)

The described read barrier requires another branch, in order to check if an indirection is necessary or not. But the branch is easily predictable, as almost all objects are not forwarded; and the barrier eliminates a data dependency as no forwarding pointer needs to be followed.

Miranda and Béra, however, use a special class ID to represent if an object has been forwarded, which allows method dispatch itself to test if an object has been forwarded. As such, their read barrier is "partial" in that the barrier only is performed after method dispatch has encountered a forwarded object.

Regardless of how the need for forwarding is tested, the forwarding pointer is stored by overwriting the first instance variable of an object, as the instance variables of that object will not be accessed anymore. (If an object has no instance variables, another word is added to the end of the object, and the word is only used for a forwarding pointer.)

It is also worth noting that the garbage collector could replace each reference to a forwarded object with the target of each object. If the garbage collector performs this task concurrently[1], the mutator must follow forwarding pointers before performing pointer-equality tests, as the mutator can encounter different pointers which still represent the same object.

[1] Similarly to incremental compaction, the collector could find all locations with pointers that must be fixed without pausing the mutator, and then fix the locations in a much shorter pause, providing short pause times and avoiding introducing overhead on the mutator.

An adaptation for Common Lisp

The forementioned technique works perfectly for a Common Lisp implementation, but we would need to make some changes for a high-performance implementation.

Most Common Lisp implementations support using multiple threads. The forwarding pointer is stored in a way which produces undesirable behaviour in the presence of data races. Suppose we have a class and instance

(defclass c () ((%a :initform 'a :reader a)))
(defvar *instance* (make-instance 'c))

and we redefine the class c, while another thread is evaluating (a *instance*). The thread has just dispatched on the class of the instance, and we forward the object just before the thread reads the slot %a. The thread reads a forwarding pointer, rather than any value which the program has ever stored in %a, which is unexplainable by the semantics of the programming language. I believe this is an undesirable possibility, and that concurrent programming is hard enough without the possibility of an implementation producing unexplainable values.

We may instead store the forwarding pointer in another word in an instance, leaving all slots of the instance intact. This introduces some space overhead, but it is no worse than the conventional representation of instances.

There is another somewhat similar situation, where we might be tempted to reuse an object when the size of the object does not change after redefinition. I also think that this optimisation produces unexplainable values due to data races. Suppose we redefined the class by

(defmethod update-instance-for-redefined-class
    ((c c) added deleted plist &key)
  (setf (slot-value c '%b) 'b))
  
(defclass c () ((%b :reader b)))

and the same data race occurs. The thread now reads b which is also quite odd, as b was never written into the slot %a before. I would also suggest to not reuse objects in this manner.

Optimisations

We might want to optimise under the assumption that a class is not redefined, or that the class of an instance does not change. For example, in

(defmethod d ((c c))
  (unknown-function)
  (a c))

we may want to inline the method of a which is specialised on c, but such inlining is not sound as (unknown-function) may redefine the class c. We may inline if we have a way to replace the call frame if either situation occurs (dynamic deoptimisation). Then we would never reach the inlined code after redefinition, and so inlining is sound.

However, I am not sure if scanning the stack for code which has invalidated assumptions would be fast enough for change-class. An application may legitimately call change-class many times, and the function only actually modifies one instance at a time, leading to little work being done per scan. One approach may be to scan the stack lazily, which minimises the work done per call to change-class. Such lazy scanning also accumulates many instances to check for in older call frames, as long as those frames are not returned to.

It may not be necessary to scan the stacks of other threads. Either the other threads correctly synchronise, and thus will naturally encounter that the class of the instance has changed, or the threads are involved in a data race. In the latter case, the threads may access the object as if its class had not been changed yet, which is acceptable behaviour.

But if it is also desired to avoid such behaviour, the other threads can resume execution immediately after scanning their own stacks, with no stop-the-world pause [2]. Escape analysis and/or thread-local heaps may also help to prove that no other threads could possibly be relying on assumptions of the class of an instance, when the object is not reachable by other threads.

Scanning the stacks of all threads, however, could save some space. The representation used by Miranda and Béra (with the forwarding pointer stored in the first slot of an instance) could be used in a concurrent setting, as no threads will attempt to access the slots of the forwarding object after scanning the stack.

[2] Actually, I think the protocol to avoid stopping the world would be more complicated, like the handshaking protocol of any on-the-fly garbage collector. But it might not matter; the Go garbage collector and the ZGC of Java lazily scan stacks but still stop the world, and produce pause times below a hundred microseconds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment