Most CLOS implementations use a representation for instances which
uses an indirection, in order to implement class redefinition and
change-class
. In this document I propose a technique which
avoids the indirection most of the time, describe how to optimise
around the assumption that a class does not change, and how
this technique fares with multiple threads.
Smalltalk also allows for class redefinition, and has a become:
method which swaps objects in memory (with similar power to
change-class
in Common Lisp). Thus, Smalltalk implementations
either required an indirection for all objects (often through the
"object table"), or the implementations required the forementioned
operations to scan the whole heap to rewrite references to objects
affected by the operations. We'll focus solely on how to implement
class redefinition in most of this document; the only difference
between class redefinition and change-class
is the cost model.
Miranda and Béra implemented a partial read barrier for Smalltalk. In their scheme, an object may be forwarded, requiring accesses of the objects to follow a forwarding pointer to access the instance variables of the object. The forwarding pointer points to the target of a forwarded object.
Accessing the instance variables ("slots" in Common Lisp terminology)
does not require following a forwarding pointer, if an object has not
had its class redefined, and the object has not been involved in a
become:
operation. Forwarding is only required after one of the two
forementioned events occurs to an instance, and the object is marked
as forwarded when such events occur.
(There is somewhat a jump in terminology here, from "indirection" to "read barrier". But it is justified; the conventional "indirect" representation requires something like Brooks's read barrier, which always follows a forwarding pointer, before an instance can be used. The Miranda-Béra partial read barrier resembles Baker's read barrier, which tests if the object has been forwarded, before following a forwarding pointer. So I'll call the actions that must be done before accessing the slots of an instance a "read barrier".)
The described read barrier requires another branch, in order to check if an indirection is necessary or not. But the branch is easily predictable, as almost all objects are not forwarded; and the barrier eliminates a data dependency as no forwarding pointer needs to be followed.
Miranda and Béra, however, use a special class ID to represent if an object has been forwarded, which allows method dispatch itself to test if an object has been forwarded. As such, their read barrier is "partial" in that the barrier only is performed after method dispatch has encountered a forwarded object.
Regardless of how the need for forwarding is tested, the forwarding pointer is stored by overwriting the first instance variable of an object, as the instance variables of that object will not be accessed anymore. (If an object has no instance variables, another word is added to the end of the object, and the word is only used for a forwarding pointer.)
It is also worth noting that the garbage collector could replace each reference to a forwarded object with the target of each object. If the garbage collector performs this task concurrently[1], the mutator must follow forwarding pointers before performing pointer-equality tests, as the mutator can encounter different pointers which still represent the same object.
[1] Similarly to incremental compaction, the collector could find all locations with pointers that must be fixed without pausing the mutator, and then fix the locations in a much shorter pause, providing short pause times and avoiding introducing overhead on the mutator.
The forementioned technique works perfectly for a Common Lisp implementation, but we would need to make some changes for a high-performance implementation.
Most Common Lisp implementations support using multiple threads. The forwarding pointer is stored in a way which produces undesirable behaviour in the presence of data races. Suppose we have a class and instance
(defclass c () ((%a :initform 'a :reader a)))
(defvar *instance* (make-instance 'c))
and we redefine the class c
, while another thread is evaluating
(a *instance*)
. The thread has just dispatched on the class
of the instance, and we forward the object just before the
thread reads the slot %a
. The thread reads a forwarding
pointer, rather than any value which the program has ever stored
in %a
, which is unexplainable by the semantics of the programming
language. I believe this is an undesirable possibility, and that
concurrent programming is hard enough without the possibility of an
implementation producing unexplainable values.
We may instead store the forwarding pointer in another word in an instance, leaving all slots of the instance intact. This introduces some space overhead, but it is no worse than the conventional representation of instances.
There is another somewhat similar situation, where we might be tempted to reuse an object when the size of the object does not change after redefinition. I also think that this optimisation produces unexplainable values due to data races. Suppose we redefined the class by
(defmethod update-instance-for-redefined-class
((c c) added deleted plist &key)
(setf (slot-value c '%b) 'b))
(defclass c () ((%b :reader b)))
and the same data race occurs. The thread now reads b
which is also quite odd, as b
was never written into the
slot %a
before. I would also suggest to not reuse objects in this
manner.
We might want to optimise under the assumption that a class is not redefined, or that the class of an instance does not change. For example, in
(defmethod d ((c c))
(unknown-function)
(a c))
we may want to inline the method of a
which is specialised on c
,
but such inlining is not sound as (unknown-function)
may redefine
the class c
. We may inline if we have a way to replace the call
frame if either situation occurs (dynamic deoptimisation).
Then we would never reach the inlined code after redefinition, and
so inlining is sound.
However, I am not sure if scanning the stack for code which has
invalidated assumptions would be fast enough for change-class
.
An application may legitimately call change-class
many times,
and the function only actually modifies one instance at a time,
leading to little work being done per scan. One approach may be
to scan the stack lazily,
which minimises the work done per call to change-class
. Such
lazy scanning also accumulates many instances to check for
in older call frames, as long as those frames are not returned to.
It may not be necessary to scan the stacks of other threads. Either the other threads correctly synchronise, and thus will naturally encounter that the class of the instance has changed, or the threads are involved in a data race. In the latter case, the threads may access the object as if its class had not been changed yet, which is acceptable behaviour.
But if it is also desired to avoid such behaviour, the other threads can resume execution immediately after scanning their own stacks, with no stop-the-world pause [2]. Escape analysis and/or thread-local heaps may also help to prove that no other threads could possibly be relying on assumptions of the class of an instance, when the object is not reachable by other threads.
Scanning the stacks of all threads, however, could save some space. The representation used by Miranda and Béra (with the forwarding pointer stored in the first slot of an instance) could be used in a concurrent setting, as no threads will attempt to access the slots of the forwarding object after scanning the stack.
[2] Actually, I think the protocol to avoid stopping the world would be more complicated, like the handshaking protocol of any on-the-fly garbage collector. But it might not matter; the Go garbage collector and the ZGC of Java lazily scan stacks but still stop the world, and produce pause times below a hundred microseconds.