Skip to content

Instantly share code, notes, and snippets.

@jnthn
Last active July 21, 2020 15:53
Show Gist options
  • Save jnthn/e81634dec57acdea87fcb2b92c722959 to your computer and use it in GitHub Desktop.
Save jnthn/e81634dec57acdea87fcb2b92c722959 to your computer and use it in GitHub Desktop.
New MoarVM dispatch

MoarVM general dispatcher API

Overview

This proposal describes a unification that will subsume all of:

  • Invocation spec (deciding how to unwrap a code object to find its underlying VM reference, or alternatively delegating to a fallback like CALL-ME)
  • Standard method dispatch (today, a findmethod + invoke sequence), along with can; the up-front construction of method caches will also go away, which should greatly reduce the size of precomp outputs when many classes are defined, and hopefully resolve various of the augment propagation issues too - at least if the augment happens early enough)
  • Multiple dispatch (and with it the multi-dispatch cache)
  • Spesh plugins (and with it the plugin guard tree)
  • The bits of deferral support provided by setdispatcher and friends (it will provide a much more complete solution in this area)
  • Specializations as produced by MoarVM spesh (and with it the argument guard tree)
  • Many of the current ops that are effectively the VM's "syscalls", and so do not really need such short op sequences. This will also make it far easier to funnel extra optional arguments into them in the future.

In place of the above comes a single dispatch mechanism. A dispatch gives its arguments using the existing arg op sequence, but instead of following it by an invoke instead follows it with a dispatch, naming the dispatcher to use. Some arguments may be aimed only at the dispatcher itself. For example, a method dispatch will start with the method name to dispatch to. The dispatcher will drop that first argument from the capture before it is sent onward.

The mechanism has much in common with dispatch chains. Spesh plugins were designed without awareness of this paper, but are largely a realization of the same idea. However, dispatch chains alone don't seem to have an answer to the "continued dispatch" problem that we also need to tackle. The underlying principle still applies, however: we write a piece of code ("the dispatch handler") that is invoked in order to decide how to perform a dispatch, and which specifies conditions ("guards") under which the result it calculates applies. Future dispatches consider these guards, and chooses a result using them, only invoking the dispatch handler if the guards go unsatisfied.

Some dispatchers will be built in to the VM; others can be written in anything that compiles into bytecode for MoarVM. They can specify guards against the incoming arguments (that is, assumptions that must be upheld for the dispatch outcome to be considered valid). They can specify argument capture changes, such as dropping incoming arguments or prepending others. And, finally, they can defer to another dispatcher by naming it. In fact, every dispatcher that is not built-in will defer to another, because that's how we end up in one that knows how to actually do something.

MoarVM will provide a number of dispatchers built-in. Following the naming convention of the BOOT types, these will all start with the prefix boot-. They are:

  • boot-value - takes the initial argument of the incoming capture and uses it as the result of the dispatch. This is useful for things like return type checks that, assuming the value is properly typed and does not need a decontainerization operation, just want to pass it along. (In fact, since attribute reads can be encoded in the dispatch tree, a simple Scalar decont can be done without any kind of call, so we'll be faster in the non-specialized case at this. And things that really are just the identity operation won't need to either.)
  • boot-constant - takes the initial argument of the incoming capture and uses it as a constant result. After specialization, the capture value and any pure operations producing it would be eligible for dead code elimination, assuming they are not needed for guards.
  • boot-code-constant - one way to run bytecode. Takes an object with REPR MVMCode or MVMCFunction and runs it. The thing to invoke is the initial argument, and the rest are the arguments to the call. The thing to invoke is treated as a constant; this is good for methods, but less so for closures and the like.
  • boot-code-value - also runs bytecode. Takes an object with REPR MVMCode or MVMCFunction and runs it. The thing to invoke is the initial argument, and the rest are the arguments to the call. The thing to invoke is not guarded against.
  • boot-syscall - the mechanism that will ultimately replace many of the ops that exist today, and allow us to do calls to VM-provided I/O, decoding, and so forth. This will also do the REPR-checking and similar of those ops, meaning we'll be able to optimize it out - something we cannot do today.
  • boot-by-language - looks at the first argument's HLL and sees if we have a registered default dispatcher for that language. If so, delegates to that. Otherwise, tries delegating to boot-code.

[Speculation: we might also have a boot-nativecall, which will replace the special-case ops for entering a native call today.]

While boot-code is externally opaque, it may actually defer to other dispatchers internally too. For example, code that has a specialization will defer to a dispatcher that selects the specialization.

When we chain dispatchers, we'll look at guards that have already been asserted and not re-assert those. Today, we can potentially end up in a situation where we do a method dispatch, followed by a multi dispatch, followed by a specialization selection, and each of these involves an assertion of the types of the arguments. Spesh can collapse all of these and either inline or produce a fastinvoke, but for the case where the caller is not specialized, perhaps because it's too morphic, we could expect an improvement.

Dispatcher API

Every userspace dispatcher has:

  • A unique ID, which is used in order to do a dispatch using it.
  • A dispatch callback, which is invoked when there is no matching result in the dispatch cache. It is passed an argument capture object. It should pick another dispatcher to delegate to as its final step (it may choose different ones on different code-paths).
  • For resumable dispatches, a resume callback, which is invoked with the dispatch resumption state if the dispatch is resumed.

Dispatcher invocation

We add the following new ops in order to perform initial dispatches:

  • dispatch_v str callsite ... - dispatch using the specified dispatcher, and discard any result
  • dispatch_o w(obj) str callsite ... - dispatch using the specified dispatcher, and expect an object result
  • dispatch_i w(int64) str callsite ... - dispatch using the specified dispatcher, and expect an int result
  • dispatch_n w(num64) str callsite ... - dispatch using the specified dispatcher, and expect a num result
  • dispatch_s w(str) str callsite ... - dispatch using the specified dispatcher, and expect a string result

These ops are followed by a variable number of register indices, which supply the arguments to pass. This is determined by looking at the callsite object. Constant arguments are placed in registers, so there are no direct argconst equivalents. However, constant arguments are indicated in the callsite as such. This is strictly more powerful than today, since there's no way to have a constant object argument today, but that could be emitted if, for example, the child is a wval.

A method dispatch today looks like this:

decont r1, r0
findmethod r1, r1, 'foo'
prepargs <callsite>
arg_o r0
invoke_o r2, r1

Which is a 30-byte sequence under the current encoding.

With the new dispatch op, a Raku method dispatch would look something like this (noting the name is in the callsite):

decont r1, r0
dispatch_o r2, 'raku-method', <callsite>, r1, r0

Which is a 20-byte sequence under the current encoding. If we find a way to move the decont into the dispatch instruction, it becomes just 12 (which provides quite an incentive to do that!) A private method dispatch would look similar:

decont r1, r0
dispatch_o r2, 'raku-private-method', <callsite>, r1, r0

This is a notable shortening of the sequence today, because spesh plugins are always two steps:

  1. Resolve the plugin to a callable
  2. Invoke that callable

So we're saving a prepargs, a speshresolve, at least one arg instruction (since we don't have to pass the invocant to both the plugin and then the real invocation), and the temporary it stores into. There will be similar gains for the assignment and return value check spesh plugins used in Rakudo today.

The invoke series of ops today will use the built-in boot-by-language dispatcher, which will:

  1. Look at the HLL owner of the object being invoked, and see if a default fallback dispatcher has been registered for that language. If so, it will use that. The invokee will become the first argument to the dispatcher.
  2. Otherwise, it will fall back to the builtin boot-code dispatcher.

Language-defined dispatchers may also, upon encountering something they do not recognize, defer to boot-language.

Registration

In the name of dogfooding, registering a dispatcher will be done using the boot-syscall dispatcher. The first argument is the name of the VM syscall to make - dispatcher-register - followed by 2 required arguments (the name of the dispatcher and the dispatch callback) and one optional argument (the resume callback). Both of the code callbacks must be MVMCodes - that is to say, they should already have been unwrapped from any language code objects.

# The callsite contains the dispatcher-register constant index
const_s r0, 'raku-method'
getcode r1, the_dispatch_callback
getcode r2, the_resume_callback
dispatch_v 'boot-syscall', <callsite>, r0, r1, r2

At an nqp::op level, however, this will likely be exposed as simply nqp::dispregister('raku-method', -> $capture { ... }).

Guarding

Within a dispatch handler callback, it is allowed to guard against:

  • Any object obtained from the argument capture
  • Any argument obtained by a guarded attribute read

In order to guard a value, it first has to be obtained via. a dispatcher-track-* operation. This lets the dispatch mechanism know the source of a value that we guard, so it can write the dispatch program for us. We read a value out of a capture for guarding purposes using the dispatcher-track-arg($capture, $index)) syscall. We can then pass it to a guard operation, also done via the boot-syscall mechanism:

  • dispatcher-guard-type($tracked) - guard that the value has precisely its current type (nqp::dispguardtype)
  • dispatcher-guard-concreteness($tracked) - guard that the value has its concreteness (nqp::dispguardconcreteness)
  • dispatcher-guard-literal($tracked) - guard that the value has its current literal value (by object pointer comparison or obj, and value comparision for str, int, and num) (`nqp::dispguard')
  • dispatcher-guard-not-literal-obj($tracked, $unwanted) - guard that the value is never a certain other object literal (used for Raku's "this is not Nil" assignment checking) (nqp::dispguardnotobj)

It is allowed to loop over the argument capture and install guards based upon its content; the multi-dispatcher shall wish to do this, for example.

Guarded attribute reads

Sometimes it's useful to dig into an object's attributes and guard against those. This is done with a guarded attribute read. This is a boot-syscall of the form dispatcher-track-attr($obj, $class, $name) (nqp::disptrackattr). It will guard the type and concreteness of the object being read from (although since such guard additions check the existing set of guards, it's not inefficient if they were already guarded against). The value of the attribute is returned.

Capture transformation

Capture transformations are used in order to alter what we send along to the next dispatcher. For example, in a method dispatcher we'd replace the method name with the resolved method's code object and pass it along to the code object invoker. The VM syscalls are:

  • dispatcher-drop-arg($capture, $index) - forms a new capture without the argument at the specified index. (nqp::dispdroparg)
  • dispatcher-insert-arg($capture, $index, $tracked) - inserts a tracked value into the capture at the specified index.
  • dispatcher-insert-arg-literal-obj($capture, $index, $object) - inserts a literal object into the capture at the specified index (this means that the dispatch program will record this as a constant)
  • dispatcher-insert-arg-literal-str($capture, $index, $str) - inserts a literal string into the capture at the specified index (this means that the dispatch program will record this as a constant)

All of these return a new capture and a "new" callsite (typically an interned one that fits the new shape), and never mutate an existing one in place. Further, while they are called against a capture object when the dispatcher is running, they may be totally elided once the dispatch guard/operation chain has been set up.

Passing control to another dispatcher

The dispatcher-delegate syscall (nqp::dispdelegate) takes the name of a dispatcher and an argument capture. Control will be passed on to that dispatcher once the current one returns. There must be exactly one call to this; it is an error to not call it, and an error to call it more than once. Either will result in an exception throw.

Dispatch state

Dispatch state is used in order to preserve data that is used in deferral. This initial dispatch state set by the dispatch callback must be one of:

  • The incoming argument capture
  • A derived argument capture
  • A value that is guarded against

That way, the dispatch state can easily be derived during the guard tree processing. If the runtime can prove that deferral will never happen with this dispatch state, then it is free to elide storing it at all (a lot like it can elide takedispatcher today).

Example Raku dispatchers

The basic Rakudo code object dispatcher

This replaces the invocation spec's role today.

nqp::dispregister('raku-invoke',
    -> $capture {
        # Obtain the target of the invocation.
        my $obj := nqp::capturepos($capture, 0);
        nqp::dispguardtype($obj);
        nqp::dispguardconcreteness($obj);
        if nqp::istype($obj, Code) && nqp::isconcrete($obj) {
            # Rakudo code object. Unwrap it and replace it in the capture.
            my $do := nqp::dispgetattr($obj, Code, '$!do');
            my $delegate-capture := nqp::dispreplacearg($capture, 0, $do);
            nqp::dispdelegate('boot-code', $delegate-capture)
        }
        else {
            # Need to do CALL-ME on it.
            my $delegate-capture = nqp::dispinsertstr($capture, 0, 'CALL-ME');
            nqp::dispdelegate('raku-method-call', $delegate-capture)
        }
    }
);

The CALL-ME case has potential to be far more efficient than is possible today. In principle, we'll be able to inline the CALL-ME method, if it's small enough. (Also, this should let us eliminate the pesky "Cannot invoke object with invocation handler in this context" problem.)

Method call dispatch

This is invoked with a capture where the method name is the first argument, followed by decont'd invocant, followed by the normal method arguments.

nqp::dispregister('raku-method-call',
    -> $capture {
        # Obtain the method name and invocant. Guard against the literal name,
        # and against the type (but not concreteness) of the method. In the
        # case we were given an argconst the first will be a no-op.
        my str $name := nqp::captureposarg_s($capture, 0);
        nqp::dispguardstr($name);
        my $invocant := nqp::capturepos($capture, 1);
        nqp::dispguardtype($invocant);

        # Use the MOP to resolve the method.
        my $meth = $invocant.HOW.find_method($invocant, $name);
        unless nqp::isconcrete($meth) {
            # Really would obtain/throw the typed exception here.
            nqp::die("No such method '$name'");
        }

        # The dispatch state for resumption is the dispatch capture, which
        # holds the name and arguments.
        nqp::dispstate($capture);

        # Before delegating onrward, we drop the name argument and replace
        # the invocant argument with the target od hte dispatch.
        my $delegate-capture := nqp::dispreplacearg(
                nqp::dispdroparg($capture, 0), 0, $meth);

        # Is this a multi-method?
        if nqp::istype($meth, Method) && $meth.is_dispatcher {
            # Yes, so delegate to multiple dispatch. We need to tack on the
            # resolved thing to invoke.
            nqp::dispdelegate('raku-multi-dispatch', $delegate-capture)
        }
        else {     
            # No, so produce an invocation result, containing the method to
            # invoke and the argument capture sans name.
            nqp::dispdelegate('raku-invoke',  $delegate-capture)
        }
    },
    'resume', -> $state, $args {
        # Discussed later
    }
));

General dispatch

This is standard invocation in Raku. It may be to a sub or multi-sub, but it's possible it'll also be to a method. This matters because deferral is meant to work even if it is not invoked with method syntax. It is invoked with a capture where the first thing is the sub to call, and the rest are the arguments. This is not resumable.

nqp::dispregister('raku-call', -> $capture {
    # Guard on the type of what we're invoking.
    my $target := nqp::captureposarg($capture, 0);
    nqp::guarddisptype($target);

    # It could be a method call in disguise. We need to send this through the
    # indirect method dispatcher, so that deferral works.
    if nqp::istype($target, Method) {
        nqp::dispdelegate('raku-indirect-method-call', $target);
    }

    # Otherwise, it might be a multiple dispatch.
    elsif nqp::istype($target, Routine) && $target.is_dispatcher {
        nqp::dispguardconcreteness(nqp::dispgetattr($target, Routine, '@!dispatchees'));
        nqp::dispdelegate('raku-multi-dispatch', $target);
    }

    # Otherwise, it's a straight invocation of what we have.
    else {
        nqp::dispdelegate('raku-invoke', $target)
    }
});

Multiple dispatch

TODO

Wrap dispatch

TODO

Resumed dispatches

The following kinds of dispatch involve resumption:

  • Method dispatch (continuing to walk the MRO)
  • Multiple dispatch (continuing to walk the candidates)
  • Wrap dispatch (continuing to the next wrapper)

Furthermore, these fall back on their enclosing dispatcher in some cases (but not all). For example, a call to a wrapped multi method will need to exhaust the wrap dispatcher, then the multi dispatcher, then the method dispatcher.

Resuming a dispatch is relatively uncommon, and we'd like the overhead of this feature to be as low as possible in the case it is not called upon. At the same time, the cost of deferral today is very high. For example:

class P { method m() { } }
class C is P { method m() { callsame() } }
for ^1_000_000 {
    C.m
}

Runs in 4.36s, while if we explicitly name the parent class (this form is optimized through a spesh plugin):

class P { method m() { } }
class C is P { method m() { self.P::m } }
for ^1_000_000 {
    C.m
}

Then it runs in 0.066s, presumably thanks to full inlining by spesh. That's not just a little faster, it's a factor of more than 65x faster! Of course, this assumes the method does nothing, so in a realistic application it'd most likely be amortized somewhat. But still, when modules like OO::Monitors use wrap and callsame, there's some motivation to give it a decent speedup.

Making the problem harder still is the fact that:

  • We can either retain the current args or specify new args - but in the case we specify new args, and it's a method dispatch, we need to include the original invocant too.
  • One can interact with the dispatch process by calling lastcall.
  • One can obtain the next dispatch target using nextcallee.
  • Things like callsame are just subroutines that you call. OO::Monitors actually uses this fact to work around a pre-comp bug by storing the CORE.setting &callsame away in its local lexpad, in order that it doesn't get lost. Even without that fun, this is still a source of trouble, in that any callsite inside of callsame would be megamorphic even though the actual dispatch being resumed is likley to be far less morphic.

Dispatch state

A dispatcher that is resumable can store dispatch state. Dispatch state is actually an argument capture derived from that supplied during the initial dispatch. In the case the specializer can prove that this state shall not be needed, it can completely skip the work to create it. Otherwise, it knows it must at least keep enough information around to construct it on demand. (Hopefully this will give us a lot more freedom than we have today, where we must mark anything using callsame and friends as not possible to inline. In principle, this new dispatch mechanism will allow us to lift this limitation.)

Upon resumption

When we wish to resume a dispatch, we provide an argument. The proposed use of this for Raku's dispatchers will be:

  • An argument capture when we are doing callwith or nextwith
  • Sentinel objects for each of callsame/nextsame, nextcallee, and lastcall

Resuming a dispatch for the first time at a given callsite will:

  1. Locate the dispatcher that we shall resume.
  2. Invoke its resume callback with the current dispatch state and the argument.
  3. Expect it to update the dispatch state and delegate to a dispatcher (ultimately bottoming out in a VM one).

The resume callback, much like the dispatch one, can guard against its incoming arguments: of note, the dispatch state and the resume argument. Furthermore, it can set an updated dispatch state. As with dispatch, these are all considered as guard operations (including the formation of the new dispatch state), which means that when the guards are met, the resume callback need not be run. (We assume there is a mechanism to ask for the state to be stored against the caller's callsite, not our current one, to avoid the megamorphism inside of callsame and friends.)

How the dispatch state is used

When we defer, the set of candidates we shall iterate through is already preordained based on the initial argument capture. This means we can:

  1. Find the candidate list we'll work our way through.
  2. Add guards expressing what it depends on.
  3. Prepend those to the dispatch state capture, except the one we will be now most immediately calling.
  4. And before that, a sentinel object that indicates we already decided on the candidates, and perhaps another that separates them from the actual arguments.

Let's take the slow case mentioned earlier as an example.

class P { method m() { } }
class C is P { method m() { callsame() } }
for ^1_000_000 {
    C.m
}

The first time around, there's no guard set. The C.m call triggers the dispatch callback. This finds the method, and stashes away the original argument capture as dispatch state, which is prepended with the name. We run C.m. It calls callsame. That triggers a resumption, which is passed the dispatch state and the sentinel indicating we're doing a callsame. Again, the guard set at this callsite (assuming we manage to attach it one-up, at the site of the callsame call) is empty, so we run the resume callback. It finds all of the ms in MRO. In this case there's only one, so it just replaces the method name in the dispatch state with the "end of candidates" sentinel (effectively, the remaining candidates are the empty list). It establishes a guard against the type of the invocant in the dispatch state and the callsame sentinel to make sure we're safe, and then delegates the dispatch so that we invoke the P.m.

The second time around the loop is rather easier. Since m is a constant, the only thing we need to do in the guard set to choose a candidate is to check the type of the invocant is C. It is. The guard tree then tells us to fetch the $!code attribute of m. We do so. (Or we might have just cached that constant in the guard tree.) Either way, we end up in m, without having to run dispatch. We see the callsame. We do actually have to call it, but its resume call then looks at the guards established last time around and winds up following a path that invokes P.m. Thus we don't call the resume callback again in this case. We also didn't have to allocate a dispatcher object as we do today. It remains to be seen to what degree we can avoid allocations of the various derived captures.

We proceed this way for some further iterations. Meanwhile, at some point, the code is considered hot and the specializer gets to work. We first end up in C.m. We inline callsame, and find the guard set that goes with the resume. We walk it, which leads us to C.p, which is tiny, so we inline it. Since we can see callsame passes the sentinel for doing a callsame, that guard goes away. At this point, the rest pretty much stays as it is. However, we can compile the checking guards into the body of C.m, meaning that we don't have to interpret them, and they can instead be JITted.

Later, we inline the optimized C.m into the body of the loop. Here the party really gets started, because we can now see precisely what dispatch state is set, and we know the types of the values that go into it. Given the simple control flow, this means that we know what it will be upon resumption too, and thus we can prove and delete all of the resumption guards. That means we'll have optimized this poor naive benchmark into an empty loop body - much like we could with the self.P::m case, and thus it should run rather fast.

Nesting dispatchers

Roughly:

  • When a resume is invoked and it has run out of candidates, it can delegate to boot-next-dispatcher. Not all need do this; a wrap dispatcher probably will as it could be in a method dispatcher or multi dispatcher, but a method dispatcher would have nowhere to go.
  • Dispatchers can, when setting up a delegation to another dispatcher, indicate if that dispatcher should fall back to this one or not. Thus when a method dispatcher delegates to a multi dispatcher, it can say "if you fallback, it is to me". It remains to be seen precisely how well we can get wrap dispatchers into this picture too without too much repetition.

Implementation notes

Overall plan

  1. Implement enough that we can migrate spesh plugins. Migrate them. Maybe at this part-way point we can even merge; on the other hand, we may be polishing something that isn't exercised on all the use-cases.
    1. Get the new dispatch instruction added and handled by validation. Add NQP code to generate it. For now, it just gives an NYI panic. (DONE)
    2. Implement the new constant value flag in callsites. (DONE)
    3. Implement cache storage. A quick, cheap way for bytecode instructions that want to cache stuff against an instruction to do so. We'll use it for dispatch but also for getlexstatic_o (and will spectest it against this). May find further use cases in the future. (Current idea: keep a pointer array in parallel with the bytecode, work out the density we need (to a power of two) and use the instruction offset to decide on that. (DONE)
    4. Refactor getlexstatic_o optimization to look at the inline cache rather than needing a value log entry. To prove spesh can get at it. (DONE)
    5. Detailed design of how we'll handle argument captures going forward. (DONE)
    6. Implement dispatcher registry. (DONE)
    7. Refactor call stack so that it can hold more than just frames, and so it always hangs on to even promoted frames. (DONE, modulo a deopt issue being uncovered)
    8. Implement enough to be able to produce a capture object for the dispatch callback and invoke a dispatcher with it. (DONE)
    9. Basic design for the dispatch program execution enough for boot-value, boot-constant, and boot-code (the terminals), along with some simple guards. (DONE)
    10. Implement boot-value being invoked and recording a dispatch program, but not actually storing it yet. (DONE)
    11. Implement boot-value actually running the dispatch program, so as not to run the dispatch callback. (DONE)
    12. Implement the boot-constant dispatcher. (DONE)
    13. Implement the boot-code-constant dispatcher. (DONE)
    14. Implement the VM "syscall" dispatch mechanism. (DONE)
    15. Implement registering a userspace dispatcher and calling it (but it can't do anything useful yet). (DONE)
    16. Add support for delegating to another dispatcher for userland dispatchers. (DONE)
    17. Add support for setting guards in userland dispatchers. (DONE)
    18. Add support for capture transformations in userland dispatchers. (DONE)
    19. Try switching some spesh plugins. So far, should only work with spesh disabled. (DONE)
    20. Properly handle non-internable callsites in dispatch programs, and revise rules for what we can and cannot intern. (DONE)
    21. Implement flattening prior to dispatch, selecting a fitting callsite. (DONE)
    22. Make sure that, with the exception of things depending on multiple dispatch, the things migrated from spesh plugins to the dispatcher work out. (DONE)
    23. Eliminate all spesh plugins in Raku. (DONE)
    24. Eliminate spesh plugins in MoarVM, and their NQP tests. (DONE)
    25. Teach spesh about dispatch and get it to do the rewrite of a dispatch instruction into guards + linking or inlining.
  2. Switch Rakudo method, multiple, and wrap dispatch to use the new scheme. Update callsame and friends to use the new model too. These do need resumption. Make sure we handle all the dispatcher nesting cases.
  3. Switch NQP method and multiple dispatch to use the new scheme. These don't need resumption.
  4. Elimiante last remaining remenants of findmethod and other method caching infrastructure; nqp::can goes along with it.
  5. Re-tune things that need it, exploit new optimization opportunities.
  6. Figure out how to un-break other backends, to the degree they got broken.
  7. Figure out how to replace the spesh arg guards with this too, if that was not already done.

Code stucture

  • MVM_disp_registry_* (src/disp/registry.c) - the dispatcher registry, which is populated with the boot (VM-provided) ones at startup.
  • MVM_disp_boot_* (src/disp/boot.c) - implementations of the bootstrap dispatchers that are provided by the VM
  • MVM_disp_record_* (src/disp/record.c) - recording of new dispatch chain entries, including capture transforms and guard operations
  • MVM_disp_tree_* (src/disp/tree.c) - implementation of the dispatch tree for faster lookups, both building the tree and walking it

Dispatch guard sets

  • Store the list of guard ops as the primary data. For each, we have the set of operations, followed by a final result (constant, run C function, set up a frame and invoke bytecode). This bit is along the lines of how spesh plugins do it, except we plan to de-dupe them too, and are involved with captures. But...
  • Steal the spesh arg guard tree approach, where we use a tree structure to lift common prefixes. Just rebuild the tree when we get new guard ops, again like we do in the spesh arg guard tree (it works based on us always having the type tuples there).
  • The guard tree produces an integer result, which we use to look up the final action, but also in unspecialized code to record where we ended up going, for the purposes of spesh (again, much like spesh plugins do).
  • Store them off the spesh data structure on a static frame. Maybe we will preallocate that now in any frame that contains a dispatch instruction (or an invoke, which devolves to that), to avoid the check every time.
  • The same binary search approach for bytecode addresses can also work.

Capture operations

Argument representation

The setup phase of a dispatch deals in argument captures, and transformations on those. It will receive an actual Capture object. However, typically, we'll evaluate the "dispatch program" without forming such an object, often ending up invoking code that then receives arguments as parameters.

MoarVM to date uses an arguments buffer that is a designated part of the work area of a call frame. Argument instructions copy from registers into that area. In the interpreter, that means one instruction to dispatch per argument we pass, plus we need to take care of GC-walking that area. On the parameter handling side, we then form an argument processing context. When dealing with captures of arguments, or flattening arguments, those can also be referred to by the argument processing area. Flattening arguments need us to allocate an argument buffer too.

The dispatch instruction is different: it lists the registers that hold the arguments. This means no argument buffer to allocate space for nor to mark in the GC. It also means no instructions to interpret per argument on the callee side, and that there's no (immediate) need to copy the arguments anywhere.

Ideally, we'd like to achieve:

  1. No copying of arguments other than to their final destination in the common case. Granted we may need to read them while interpreting guards. But we'd like to, when possible, avoid having an argument buffer that we temporarily copy all of the args into. Of course, in some situations, that won't happen: flattening and when we need to go via. the dispatch setup phase and so have the arguments in a capture object.
  2. Argument access in a callee being polymorphic over "direct from the dispatch instruction" or "via. flattening or a capture". Even after specialized, we'd ideally retain the polymorphism here.
  3. That polymorphism is cheap (e.g. free of branch instructions).

This can be achieved by modeling arguments to bind as:

  • A callsite object (which we don't care about in the specialized case)
  • An array args of MVMRegister which holds arguments
  • An array map of MVMuint16 which tells us which element in the argument array to look at for a particular argument

In the case of the arguments coming directly from the dispatch instruction, args is the work area of the caller, and map is the set of mappings in the dispatch bytecode instruction. (Note that it will probably be a couple of positions into that, since the first argument(s) are normally aimed only at the dispatcher, not at the final callee.) In the case of the JIT, this can live in the data segment.

But in the case we already expanded the arguments into a buffer (because we formed a capture object or because of flattening), and so are not reading from the work area but an already organized set of arguments, what should map be? It turns out that it can be the identity array, where element 0 is 0, element 1 is 1, etc. We can have one of these globally, perhaps at some nominal size and safepoint-reallocated with a bigger one if the program happens to do huge flattenings.

This means that argument access to argument n for parameter binding is args[map[n]], regardless of the source. This provides the desired bit of polymorphism without a branch, should JIT very easily too, and the spesh/JIT is valid whether we call it via. flattening or not.

Lifetimes

There are two lifetimes to consider for arguments:

  1. The arguments being passed are consumed only by parameter instructions. In this case, we know their consumption is bounded by the lifetime of the callee's execution. Thus, we can keep any intemediate state (for example, of expanded flattening arguemnts) on the call stack.
  2. We need a capture object. In this case, we have to assume the arguments will escape, and so should form a capture object.

Capture objects

When we need to reify a capture as an object, it contains:

  • The callsite
  • An argument buffer, allocated in the FSA

Any flattening arguments will have already been expanded. This will be implemented as a new representation, rather than brining the baggage of argument processing as found in the current CallCapture representation (which will go away after these changes). In the interim, ops on a CallCapture will also work on a new Capture.

Capture representation during a dispatch program

At the point we perform a dispatch, we have:

  • A callsite
  • A list of registers in the calling frame containing arguments

The common case is that the dispatch program (that is, the set of guards and transforms set up by the dispatch callback) will find a resolution without having to call the dispatch callback. While the dispatch call will produce intermediate capture objects as it goes about its work, we wish to elide all of that work when evaluating the resulting dispatch program. We'd also like to do as little copying around of arguments as possible; if we end up making a call, ideally the destination of the call will will just be pulling its arguments out of the work area of its caller via. the mappings in the dispatch instruction.

Due to the way dispatchers work, capture transformations will center almost almost entirely around the start of the argument list (or at least, it makes sense to put arguments aimed just at the dispatcher first). Thus, we can consider the argument list as being "split" into two parts:

  • A first part that exists as actual values for the dispatch program to use
  • A second part that exists as indexes into work registers

The dispatch program in a given dispatch will be analyzed to work out the first arg left untouched by the dispatch process, and that will be the split point for the entire dispatch. It will then allocate itself the required amount of space in its work area on the callstack for doing the argument manipulations before that point. If the dispatch results in an actual call to some frame, then assuming the arguments that are passed only consist of a tail of those in the dispatch instruction, then a pointer into the dispatch instruction's mapping list will be used as the map for use in parameter binding.

Dispatcher representation at runtime

We know that:

  • A resumable dispatcher needs storage of its dispatch state - at least, in principle. Most efficient is to do just enough to go back and recover it later. Dispatchers without state don't need to persist anything.
  • Even more fun is that one dispatch instruction can produce a stack of any number of dispatches before we actually reach the point of running code, so there's no direct mapping between them and frames.
  • Their lifetime is until we are done with the dispatch.
  • Need to make sure we "take" them in continuations too.

Calling is generally costly in MoarVM, and MVMFrame is quite heavyweight. It includes things we have to preserve if we're an outer scope, as well as things that we only care about when executing. Thus, there may be some value to separate "callstack" from current notion of "frame". Callstack becomes a region of memory containing the things we need as we run. A callstack entry may reference a "full-blown" frame, but isn't automatically one.

The callstack would have entries that are:

  1. Frames (including their work space, so we can do a single allocation here). Specialized and unspecialized frames may have different layouts. This will be more natural once spesh is folded into the dispatch chain mechanism.
  2. Dispatchers (which may also have work space).
  3. Continuation tags.
  4. Special return handlers.

For now, we'd probably stick with frames otherwise living a life much like they do today. A bit further into the future, however, it'd be good to give that a revisit, so we can have less Raku-specific knowledge in the VM and handle some things in much more elegant ways. This would also be a key step towards the re-focus on interpreter cooperation that has been speculated.

@lizmat
Copy link

lizmat commented Apr 28, 2020

Pretty cool so far! Some comments / questions:

  1. re "does not need a containerization operation". Shouldn't that be a "decontainerization operation?

  2. What's a MVMFunction?

  3. Spellos: "dispatch_v str'" -> "dispatch_v str`", "Contorl: -> "Control", "takedisaptcher" -> "takedispatcher"

  4. I don't like the "disp" prefix in e.g. nqp::dispregister, because it reminds me too much of "display" I guess. Perhaps "dx" would be a better prefix? It's definitely shorter :-) And maybe for readability add a hyphen or underscore to the prefix, so that it would become nqp::dx-register.

  5. How are you going to make sure "nqp::dispdelegate" is only called exactly once? Wouldn't that need a level of extra checking?

  6. I'm unclear as to how "nqp::dispguardtype($obj);" would know it's the correct object type?

  7. "it can completely the work to create it" I guess you meant to say "it can completely skip the work to create it" ?

@jnthn
Copy link
Author

jnthn commented Apr 29, 2020

re "does not need a containerization operation". Shouldn't that be a "decontainerization operation?

Yes, will fix.

What's a MVMFunction?

Wrapper around a C function in the VM. Used as part of the knowhow bootstrap, since the methods are written as C functions, but need to play along with the VM calling conventions.

Spellos: "dispatch_v str'" -> "dispatch_v str`", "Contorl: -> "Control", "takedisaptcher" -> "takedispatcher"

Fixed.

I don't like the "disp" prefix in e.g. nqp::dispregister, because it reminds me too much of "display" I guess. Perhaps "dx" would be a better prefix? It's definitely shorter :-)

We don't have any "display" concept in MoarVM or Raku, so I don't think there's much chance of confusion in general. Also, this simply isn't worth bike-shedding; very few folks will see/use these.

And maybe for readability add a hyphen or underscore to the prefix, so that it would become nqp::dx-register.

Then we'd have to remember that these ops are exceptions to the naming conventions all the rest follow. (The number of times I wrote nqp::forcegc...)

How are you going to make sure "nqp::dispdelegate" is only called exactly once? Wouldn't that need a level of extra checking?

Probably just a bit of state somewhere that starts out NULL and is populated, and we just check if it's already set. We can do it dynamically, I think. It's a cheap check on a rarely taken code path ('cus if you're calling this, you're setting up a dispatch chain, which in the future will hopefully be hit and thus we don't need to go through this process).

I'm unclear as to how "nqp::dispguardtype($obj);" would know it's the correct object type?

It does $obj.WHAT. We set up guards based on types of things that we are observing. About the only exception to this is when we have a negative object literal and would want to specify that. (We could in principle have a negative type guard too.)

"it can completely the work to create it" I guess you meant to say "it can completely skip the work to create it" ?

Yes, fixed.

@lizmat
Copy link

lizmat commented Apr 30, 2020

I still find nqp::dispfoo harder to read than nqp::dxfoo, the "x" acting as a sort of natural separator for the prefix.

@AlexDaniel
Copy link

If it's not something that is typed a lot then you can always just spell it out fully. If it needs to be differentiated then it can be done explicitly too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment