Skip to content

Instantly share code, notes, and snippets.

@Kaiepi
Last active April 26, 2022 21:46
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Kaiepi/6e61d0402b6606f479e393a92888c9d7 to your computer and use it in GitHub Desktop.
Save Kaiepi/6e61d0402b6606f479e393a92888c9d7 to your computer and use it in GitHub Desktop.
Annotating Types with Thread-Safe Containers

Annotating Types with Thread-Safe Containers

Back in December, I was hoping to write a new kind of type for Data::Record, but have been struggling with illness and a lengthy wait for a doctor's appointment. Despite my main computer dying off a few weeks ago, I've been able to toy with this idea to some degree.

The post linked is rather heady. If the kind of type is class-based, then a type can typecheck as another merely by inheriting from another (or its pun). What I'm looking for out of such a HOW is static state. If I can shove an array of fields for a record type into its HOW, then its operators can generate an annotation, inherit from such a class describing its behaviour, embed their fields in their HOW, and return them, with the unique MOP for lists, tuples, and hashes becoming possible to implement with metamethods alone.

Rakudo's MOP plays it rather fast and loose with thread safety at times. In the ecosystem, the tools that make this easier to achieve (Scalar and atomicint, namely) are already bootstrapped, so I'd rather not try the same here. Scalar supports a subset of atomicint's operations (atomic-fetch, atomic-load, cas), which should be good enough for our purposes. With cas, we could eke a sort of read-only binding out of it.

Let's make a nullish class to denote an empty binding:

class DEALLOC is Nil is repr<Uninstantiable> { }

The Uninstantiable REPR should be a little cheaper to work with than the default P6opaque if need be.

With this, we can write a Proxy of sorts to take care of the boilerplate involved in dealing with our sort of Scalar:

class Binder is Proxy {
    has $!value is default(DEALLOC);

    method new is raw {
    	callwith :&FETCH, :&STORE
    }

    my method FETCH is raw {
    	⚛$!value
    }

    my method STORE(Mu $value is raw --> Nil) {
    	cas $!value, DEALLOC, $value
    }
}

The first argument to a FETCH/STORE for a Proxy will be the proxy itself. Attribute accesses on a method's self from within the Binder package will not decont this in the process.

We can't depend on = to assign $!value's default value because Proxy instantiates itself with a bare CREATE instead of bless and its BUILDALL, so we'd need to depend on the !SET-SELF pattern to give it a default value dynamically. A type object's value can be known at compile-time, so is default better suits our needs here anyway.

While a thread-safe item would be enough to meet Data::Record's needs, I feel there's more that can be done here. If I had to pick any sort of container to associate with a type, I'd rather it be a Positional buffer of sorts instead. Buffers are commonly depended on in the MOP; it should be possible to write an optimized IterationBuffer-like container to represent how these can be used. We can back this with an Array, but in doing so, we limit ourselves to one writer thread at a time with regards to positional operations. We'll need to lock on write to enforce this, but what about reads? There should be a way to get away with wait-free reads on a buffer despite this, but not without sacrificing the performance of a write in the process. That should be OK: we're dealing with static state here.

Let's take a minute to talk about atomics' internals. Recently, MoarVM gained support for a C11 atomics representation for Raku's atomic ops. We're interested in what operation and memory order &cas, &atomic-load and &atomic-store correspond to (src/moar.h):

/* Returns the old value dereferenced at addr.
 * Strictly, as libatomic_ops documents it:
 *      Atomically compare *addr to old_val, and replace *addr by new_val
 *      if the first comparison succeeds; returns the original value of *addr;
 *       cannot fail spuriously.
 */
MVM_STATIC_INLINE uintptr_t
MVM_cas(volatile AO_t *addr, uintptr_t old, const uintptr_t new) {
    /* If *addr == old then { does exchange, returns true }
     * else { writes old value to &old, returns false }
     * Hence if exchange happens, we return the old value because C11 doesn't
     * overwrite &old. If exchange doesn't happen, C11 does overwrite. */
    atomic_compare_exchange_strong(addr, &old, new);
    return old;
}

/* Returns the old pointer value dereferenced at addr. Provided for a tiny bit of type safety. */
#define MVM_casptr(addr, old, new) ((void *)MVM_cas((AO_t *)(addr), (uintptr_t)(old), (uintptr_t)(new)))

/* Full memory barrier. */
#define MVM_barrier() atomic_thread_fence(memory_order_seq_cst)

/* Need to use these to assign to or read from any memory locations on
 * which the other atomic operation macros are used... */
#define MVM_store(addr, new) atomic_store((volatile AO_t *)(addr), AO_CAST(new))
#define MVM_load(addr) atomic_load((volatile AO_t *)(addr))

Each of these operations depends on memory_order_seq_cst. This gives us acquire/release semantics for a cas, giving it some capability to prevent reordering of atomic operations and interleaved non-atomic operations on its own. This can help us get wait-free reads that can be performed as a batch of writes is being performed.

We'll give our buffer allocation semantics. We'll need another marker to represent an unallocated slot should one be requested in some manner:

class REALLOC is Nil is repr<Uninstantiable> { }

We'll back our Buffer with an Array, Lock, and unsigned Int:

class Buffer does Positional {
    has @!buffer is default(REALLOC);
    has $!allocs = Lock.new;
    has $!cursor is default(0);
    ...
}

To allocate, we'll depend on a gather. If we take-rw our Binder slots in the buffer after we've written them, work necessary between $!cursor stores and loads can be performed at that point in time. This must be bound to avoid deconting Binder:

method ALLOC(::?CLASS:D: UInt:D $values --> Seq:D) {
    gather {
        $!allocs.lock;
        LEAVE $!allocs.unlock;
        do given$!cursor -> $offset is copy {
            take-rw @!buffer.BIND-POS: $offset++, Binder.new;
            $!cursor= $offset;
        } xx $values;
    }
}

&infix:<xx> is a bit of an odd operator, in that the left-hand side winds up a thunk to be invoked repetitively. Useful here!

Now we can implement Positional. We'll start with the optional STORE for the sake of mutability. While we define ALLOC with a Seq, users would probably be more interested in a cache of sorts:

method STORE(::?CLASS:D: +topic --> List:D) {
    eager self.ALLOC(topic.elems) Z= topic
}

We slurp with a bare + to type batches of updates as Iterable, making anything else a singleton list. In taking the elems of topic immediately like this, we lose support for lazy iterations (e.g. lazy gather), but should wind up with a cheaper write. The zipped = should lead to a cas in the taken Binder occurring between our $!cursor updates. Because memory_order_seq_cst should guarantee such a write (along with any write beforehand) occurs before we increment and expose its corresponding slot, the binding to our @!buffer is not atomic, yet will be committed to memory by that point anyways.

As long as we read $!cursor before reading its value as an offset, we will have an element prepared to read available there. Incidentally, our next offset is our length:

method elems(::?CLASS:D: --> Int:D) {
    ⚛$!cursor
}

method EXISTS-POS(::?CLASS:D: UInt:D $pos --> Bool:D) {
    $pos <$!cursor
}

method AT-POS(::?CLASS:D: UInt:D $pos) is raw {
    $pos <$!cursor ?? @!buffer.AT-POS($pos) !! REALLOC
}

method list(::?CLASS:D: --> List:D) {
    @!buffer[^$!cursor]
}

We'll make AT-POS raw to perform a FETCH on a slot more whenever its value is demanded should this be requested via binding. Though this isn't demanded by Positional per se, we have a list helper method to allow for coercions with the @ contextualizer and with * slices. We'll need a Seq to go with this:

method Seq(::?CLASS:D: --> Seq:D) {
    @!buffer.head:$!cursor
}

Now we're ready to think about annotations themselves. The concept of a thread-safe buffer is not unique to any given kind of type, so we'll wind up with a metarole of sorts. Having an unsignatured role candidate gives us easy mixins to the HOW of a type:

role MetamodelX::AnnotationHOW { ... }

We could use a helper role candidate to perform the mixin given another kind of type while we're here:

role MetamodelX::AnnotationHOW[::H] is H does MetamodelX::AnnotationHOW { }

We need to consider the quirks of Buffer's design before we add one to the unsignatured candidate. It wants an eager list of elements to store, but what if a user wants to reify a lazy list as part of a buffer? It updates its cursor while its writing, but what if we need a write of a list to be visible immediately? We should have a buffer for each type object, sure, but there should be a means of slipping a list of lists dynamically as well if we add another.

The MOP is friendly to getters and setters. That means can be Proxy again:

role MetamodelX::AnnotationHOW {
    has @!annotations;
    has @!elided_annotations;
    
    submethod BUILD(::?CLASS:D: --> Nil) {
    	@!annotations := Buffer.new;
        @!elided_annotations := Buffer.new;
    }

    method yield_annotations(::?CLASS:D: Mu) is raw {
        @!annotations
    }

    method elide_annotations(::?CLASS:D: Mu) is raw {
        sub FETCH(Mu) { cache gather for @!elided_annotations.Seq { take-rw $_ for @$_ } }
        sub STORE(Mu, Mu $topic is raw --> Nil) { @!elided_annotations = $topic }
        Proxy.new: :&FETCH, :&STORE
    }
}

Our buffers must be bound, which we deal with from a submethod BUILD. That's somewhat odd in this case; we should be able to write an equivalent:

has @!annotations is Buffer;
has @!elided_annotations is Buffer;

But Rakudo's HOWs are very NQP-ish objects. This syntax will throw an error eventually if we attempt to depend on their values after mixing our metarole into a class' HOW. NQP lacks a notion of containers; I suspect this would be because the @-sigilled attributes are initialized to a more NQP-ish array upon mixin instead of a Buffer (as of writing).

Our first yield_annotations buffer is transparent. The second elide_annotations buffer is exposed as a lazily reified list. We give this an untyped FETCH by coercing the topic of the outer for with @, deconting in the process. Because this will not evaluate the entire list as soon as the method is called, we can inline these within one another.

Despite our efforts to optimize reads of buffers, we need to capture a HOW's self from two closures, generate a new object, and decont by slipping a list of lists together in sequence in order to do so in the case of an elision. Some of Raku's syntax is handled at compile-time, so if we take advantage of this, we can handle all steps before the decont at that point in time.

Before we write any, it would be nice to be able to typecheck objects whose HOW carries the MetamodelX::AnnotationHOW mixin:

use Kind;

subset Annotated of Mu where Kind[MetamodelX::AnnotationHOW];

Parameterizations of type objects are handled at compile-time so long as the type arguments can be resolved at that point in time (as is the case with literals and constants, for instance). This is performed by the parameterize metamethod, which we can tack onto a couple classes to invoke yield_annotations and elide_annotations respectively:

class ANN {
    method ^parameterize(Mu, Annotated $target is raw) is raw {
        $target.^yield_annotations
    }
}

class ELI {
    method ^parameterize(Mu, Annotated $target is raw) is raw {
        $target.^elide_annotations
    }
}

Now let's see all this in action:

class Foo { ... }
BEGIN Foo.HOW does MetamodelX::AnnotationHOW;   

class Bar { ... }
BEGIN Bar.HOW does MetamodelX::AnnotationHOW;

class Bar {
    ELI[$?CLASS] = ELI[Foo], ANN[$?CLASS];
    ANN[$?CLASS] = 4, 5, 6;
}

class Foo {
    ELI[$?CLASS] = ANN[$?CLASS];
    ANN[$?CLASS] = 1, 2, 3;
}

dd ELI[Foo]; # OUTPUT: $(1, 2, 3)
dd ELI[Bar]; # OUTPUT: $(1, 2, 3, 4, 5, 6)

We stitch together a sequence of integers by taking advantage of an ANN and ELI's status as an item in their own eyes respectively. Because we err more towards laziness in the outer layers of annotations' internals, we can write our assignments to them in reverse and still produce the correct output.

In order to depend on parameterization in this case, we need to perform any mixins at compile-time (since $?CLASS is constant). We can do away with the mixin if we give our annotated class a package declarator from another module:

my package EXPORTHOW {
    package DECLARE {
        OUR::<annotation> := MetamodelX::AnnotationHOW[Metamodel::ClassHOW];
    }
}

In which case, we would wind up with:

annotation Foo { ... }

annotation Bar {
    ELI[$?CLASS] = ELI[Foo], ANN[$?CLASS];
    ANN[$?CLASS] = 4, 5, 6;
}

annotation Foo {
    ELI[$?CLASS] = ANN[$?CLASS];
    ANN[$?CLASS] = 1, 2, 3;
}

Alas, this is the point where I stop being able to test with online REPLs alone. Attached is a script demonstrating annotations.

use v6;
# use Kind;
class DEALLOC is Nil is repr<Uninstantiable> { }
class REALLOC is Nil is repr<Uninstantiable> { }
class Binder is Proxy {
has $!value is default(DEALLOC);
method new is raw {
callwith :&FETCH, :&STORE
}
my method FETCH is raw {
⚛$!value
}
my method STORE(Mu $value is raw --> Nil) {
cas $!value, DEALLOC, $value
}
}
class Buffer does Positional {
has @!buffer is default(REALLOC);
has $!allocs = Lock.new;
has $!cursor is default(0);
method ALLOC(::?CLASS:D: UInt:D $values --> Seq:D) {
gather {
$!allocs.lock;
LEAVE $!allocs.unlock;
do given ⚛$!cursor -> $offset is copy {
take-rw @!buffer.BIND-POS: $offset++, Binder.new;
$!cursor ⚛= $offset;
} xx $values;
}
}
method STORE(::?CLASS:D: +topic --> List:D) {
eager self.ALLOC(topic.elems) Z= topic
}
method elems(::?CLASS:D: --> Int:D) {
⚛$!cursor
}
method EXISTS-POS(::?CLASS:D: UInt:D $pos --> Bool:D) {
$pos < ⚛$!cursor
}
method AT-POS(::?CLASS:D: UInt:D $pos) is raw {
$pos < ⚛$!cursor ?? @!buffer.AT-POS($pos) !! REALLOC
}
method list(::?CLASS:D: --> List:D) {
@!buffer[^⚛$!cursor]
}
method Seq(::?CLASS:D: --> Seq:D) {
@!buffer.head: ⚛$!cursor
}
}
role MetamodelX::AnnotationHOW {
has @!annotations;
has @!elided_annotations;
submethod BUILD(::?CLASS:D: --> Nil) {
@!annotations := Buffer.new;
@!elided_annotations := Buffer.new;
}
method yield_annotations(::?CLASS:D: Mu) is raw {
@!annotations
}
method elide_annotations(::?CLASS:D: Mu) is raw {
sub FETCH(Mu) { cache gather for @!elided_annotations.Seq { take-rw $_ for @$_ } }
sub STORE(Mu, Mu $topic is raw --> Nil) { @!elided_annotations = $topic }
Proxy.new: :&FETCH, :&STORE
}
}
role MetamodelX::AnnotationHOW[::H] is H does MetamodelX::AnnotationHOW { }
# subset Annotated of Mu where Kind[MetamodelX::AnnotationHOW];
constant Annotated = Mu;
class ANN {
method ^parameterize(Mu, Annotated $target is raw) is raw {
$target.^yield_annotations
}
}
class ELI {
method ^parameterize(Mu, Annotated $target is raw) is raw {
$target.^elide_annotations
}
}
class Foo { ... }
BEGIN Foo.HOW does MetamodelX::AnnotationHOW;
class Bar { ... }
BEGIN Bar.HOW does MetamodelX::AnnotationHOW;
class Bar {
ELI[$?CLASS] = ELI[Foo], ANN[$?CLASS];
ANN[$?CLASS] = 4, 5, 6;
}
class Foo {
ELI[$?CLASS] = ANN[$?CLASS];
ANN[$?CLASS] = 1, 2, 3;
}
dd ELI[Foo]; # OUTPUT: $(1, 2, 3)
dd ELI[Bar]; # OUTPUT: $(1, 2, 3, 4, 5, 6)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment