This proposal describes a unification that will subsume all of:
- Invocation spec (deciding how to unwrap a code object to find its
underlying VM reference, or alternatively delegating to a fallback
like
CALL-ME
) - Standard method dispatch (today, a
findmethod
+invoke
sequence), along withcan
; the up-front construction of method caches will also go away, which should greatly reduce the size of precomp outputs when many classes are defined, and hopefully resolve various of theaugment
propagation issues too - at least if theaugment
happens early enough) - Multiple dispatch (and with it the multi-dispatch cache)
- Spesh plugins (and with it the plugin guard tree)
- The bits of deferral support provided by
setdispatcher
and friends (it will provide a much more complete solution in this area) - Specializations as produced by MoarVM spesh (and with it the argument guard tree)
- Many of the current ops that are effectively the VM's "syscalls", and so do not really need such short op sequences. This will also make it far easier to funnel extra optional arguments into them in the future.
In place of the above comes a single dispatch mechanism. A dispatch gives
its arguments using the existing arg op sequence, but instead of following
it by an invoke
instead follows it with a dispatch
, naming the dispatcher
to use. Some arguments may be aimed only at the dispatcher itself. For
example, a method dispatch will start with the method name to dispatch to.
The dispatcher will drop that first argument from the capture before it is
sent onward.
The mechanism has much in common with dispatch chains. Spesh plugins were designed without awareness of this paper, but are largely a realization of the same idea. However, dispatch chains alone don't seem to have an answer to the "continued dispatch" problem that we also need to tackle. The underlying principle still applies, however: we write a piece of code ("the dispatch handler") that is invoked in order to decide how to perform a dispatch, and which specifies conditions ("guards") under which the result it calculates applies. Future dispatches consider these guards, and chooses a result using them, only invoking the dispatch handler if the guards go unsatisfied.
Some dispatchers will be built in to the VM; others can be written in anything that compiles into bytecode for MoarVM. They can specify guards against the incoming arguments (that is, assumptions that must be upheld for the dispatch outcome to be considered valid). They can specify argument capture changes, such as dropping incoming arguments or prepending others. And, finally, they can defer to another dispatcher by naming it. In fact, every dispatcher that is not built-in will defer to another, because that's how we end up in one that knows how to actually do something.
MoarVM will provide a number of dispatchers built-in. Following the naming
convention of the BOOT
types, these will all start with the prefix boot-
.
They are:
boot-value
- takes the initial argument of the incoming capture and uses it as the result of the dispatch. This is useful for things like return type checks that, assuming the value is properly typed and does not need a decontainerization operation, just want to pass it along. (In fact, since attribute reads can be encoded in the dispatch tree, a simpleScalar
decont can be done without any kind of call, so we'll be faster in the non-specialized case at this. And things that really are just the identity operation won't need to either.)boot-constant
- takes the initial argument of the incoming capture and uses it as a constant result. After specialization, the capture value and any pure operations producing it would be eligible for dead code elimination, assuming they are not needed for guards.boot-code-constant
- one way to run bytecode. Takes an object with REPRMVMCode
orMVMCFunction
and runs it. The thing to invoke is the initial argument, and the rest are the arguments to the call. The thing to invoke is treated as a constant; this is good for methods, but less so for closures and the like.boot-code-value
- also runs bytecode. Takes an object with REPRMVMCode
orMVMCFunction
and runs it. The thing to invoke is the initial argument, and the rest are the arguments to the call. The thing to invoke is not guarded against.boot-syscall
- the mechanism that will ultimately replace many of the ops that exist today, and allow us to do calls to VM-provided I/O, decoding, and so forth. This will also do the REPR-checking and similar of those ops, meaning we'll be able to optimize it out - something we cannot do today.boot-by-language
- looks at the first argument's HLL and sees if we have a registered default dispatcher for that language. If so, delegates to that. Otherwise, tries delegating toboot-code
.
[Speculation: we might also have a boot-nativecall
, which will replace
the special-case ops for entering a native call today.]
While boot-code
is externally opaque, it may actually defer to other
dispatchers internally too. For example, code that has a specialization
will defer to a dispatcher that selects the specialization.
When we chain dispatchers, we'll look at guards that have already been
asserted and not re-assert those. Today, we can potentially end up in a
situation where we do a method dispatch, followed by a multi dispatch,
followed by a specialization selection, and each of these involves an
assertion of the types of the arguments. Spesh can collapse all of
these and either inline or produce a fastinvoke
, but for the case
where the caller is not specialized, perhaps because it's too morphic,
we could expect an improvement.
Every userspace dispatcher has:
- A unique ID, which is used in order to do a dispatch using it.
- A dispatch callback, which is invoked when there is no matching result in the dispatch cache. It is passed an argument capture object. It should pick another dispatcher to delegate to as its final step (it may choose different ones on different code-paths).
- For resumable dispatches, a resume callback, which is invoked with the dispatch resumption state if the dispatch is resumed.
We add the following new ops in order to perform initial dispatches:
dispatch_v str callsite ...
- dispatch using the specified dispatcher, and discard any resultdispatch_o w(obj) str callsite ...
- dispatch using the specified dispatcher, and expect an object resultdispatch_i w(int64) str callsite ...
- dispatch using the specified dispatcher, and expect an int resultdispatch_n w(num64) str callsite ...
- dispatch using the specified dispatcher, and expect a num resultdispatch_s w(str) str callsite ...
- dispatch using the specified dispatcher, and expect a string result
These ops are followed by a variable number of register indices, which
supply the arguments to pass. This is determined by looking at the
callsite object. Constant arguments are placed in registers, so there
are no direct argconst
equivalents. However, constant arguments are
indicated in the callsite as such. This is strictly more powerful than
today, since there's no way to have a constant object argument today,
but that could be emitted if, for example, the child is a wval
.
A method dispatch today looks like this:
decont r1, r0
findmethod r1, r1, 'foo'
prepargs <callsite>
arg_o r0
invoke_o r2, r1
Which is a 30-byte sequence under the current encoding.
With the new dispatch op, a Raku method dispatch would look something like this (noting the name is in the callsite):
decont r1, r0
dispatch_o r2, 'raku-method', <callsite>, r1, r0
Which is a 20-byte sequence under the current encoding. If we find a way to
move the decont
into the dispatch instruction, it becomes just 12 (which
provides quite an incentive to do that!) A private method dispatch would
look similar:
decont r1, r0
dispatch_o r2, 'raku-private-method', <callsite>, r1, r0
This is a notable shortening of the sequence today, because spesh plugins are always two steps:
- Resolve the plugin to a callable
- Invoke that callable
So we're saving a prepargs
, a speshresolve
, at least one arg
instruction
(since we don't have to pass the invocant to both the plugin and then the real
invocation), and the temporary it stores into. There will be similar gains for
the assignment and return value check spesh plugins used in Rakudo today.
The invoke
series of ops today will use the built-in boot-by-language
dispatcher, which will:
- Look at the HLL owner of the object being invoked, and see if a default fallback dispatcher has been registered for that language. If so, it will use that. The invokee will become the first argument to the dispatcher.
- Otherwise, it will fall back to the builtin
boot-code
dispatcher.
Language-defined dispatchers may also, upon encountering something they do
not recognize, defer to boot-language
.
In the name of dogfooding, registering a dispatcher will be done using the
boot-syscall
dispatcher. The first argument is the name of the VM syscall
to make - dispatcher-register
- followed by 2 required arguments (the
name of the dispatcher and the dispatch callback) and one optional argument
(the resume callback). Both of the code callbacks must be MVMCode
s - that
is to say, they should already have been unwrapped from any language code
objects.
# The callsite contains the dispatcher-register constant index
const_s r0, 'raku-method'
getcode r1, the_dispatch_callback
getcode r2, the_resume_callback
dispatch_v 'boot-syscall', <callsite>, r0, r1, r2
At an nqp::op
level, however, this will likely be exposed as simply
nqp::dispregister('raku-method', -> $capture { ... })
.
Within a dispatch handler callback, it is allowed to guard against:
- Any object obtained from the argument capture
- Any argument obtained by a guarded attribute read
In order to guard a value, it first has to be obtained via. a dispatcher-track-*
operation. This lets the dispatch mechanism know the source of a value that we guard,
so it can write the dispatch program for us. We read a value out of a capture for
guarding purposes using the dispatcher-track-arg($capture, $index)
) syscall. We
can then pass it to a guard operation, also done via the boot-syscall
mechanism:
dispatcher-guard-type($tracked)
- guard that the value has precisely its current type (nqp::dispguardtype
)dispatcher-guard-concreteness($tracked)
- guard that the value has its concreteness (nqp::dispguardconcreteness
)dispatcher-guard-literal($tracked)
- guard that the value has its current literal value (by object pointer comparison or obj, and value comparision for str, int, and num) (`nqp::dispguard')dispatcher-guard-not-literal-obj($tracked, $unwanted)
- guard that the value is never a certain other object literal (used for Raku's "this is not Nil" assignment checking) (nqp::dispguardnotobj
)
It is allowed to loop over the argument capture and install guards based upon its content; the multi-dispatcher shall wish to do this, for example.
Sometimes it's useful to dig into an object's attributes and guard against
those. This is done with a guarded attribute read. This is a boot-syscall
of the form dispatcher-track-attr($obj, $class, $name)
(nqp::disptrackattr
).
It will guard the type and concreteness of the object being read from
(although since such guard additions check the existing set of guards, it's
not inefficient if they were already guarded against). The value of the
attribute is returned.
Capture transformations are used in order to alter what we send along to the next dispatcher. For example, in a method dispatcher we'd replace the method name with the resolved method's code object and pass it along to the code object invoker. The VM syscalls are:
dispatcher-drop-arg($capture, $index)
- forms a new capture without the argument at the specified index. (nqp::dispdroparg
)dispatcher-insert-arg($capture, $index, $tracked
) - inserts a tracked value into the capture at the specified index.dispatcher-insert-arg-literal-obj($capture, $index, $object
) - inserts a literal object into the capture at the specified index (this means that the dispatch program will record this as a constant)dispatcher-insert-arg-literal-str($capture, $index, $str
) - inserts a literal string into the capture at the specified index (this means that the dispatch program will record this as a constant)
All of these return a new capture and a "new" callsite (typically an interned one that fits the new shape), and never mutate an existing one in place. Further, while they are called against a capture object when the dispatcher is running, they may be totally elided once the dispatch guard/operation chain has been set up.
The dispatcher-delegate
syscall (nqp::dispdelegate
) takes the name of
a dispatcher and an argument capture. Control will be passed on to that
dispatcher once the current one returns. There must be exactly one call to
this; it is an error to not call it, and an error to call it more than
once. Either will result in an exception throw.
Dispatch state is used in order to preserve data that is used in deferral. This initial dispatch state set by the dispatch callback must be one of:
- The incoming argument capture
- A derived argument capture
- A value that is guarded against
That way, the dispatch state can easily be derived during the guard tree
processing. If the runtime can prove that deferral will never happen with
this dispatch state, then it is free to elide storing it at all (a lot
like it can elide takedispatcher
today).
This replaces the invocation spec's role today.
nqp::dispregister('raku-invoke',
-> $capture {
# Obtain the target of the invocation.
my $obj := nqp::capturepos($capture, 0);
nqp::dispguardtype($obj);
nqp::dispguardconcreteness($obj);
if nqp::istype($obj, Code) && nqp::isconcrete($obj) {
# Rakudo code object. Unwrap it and replace it in the capture.
my $do := nqp::dispgetattr($obj, Code, '$!do');
my $delegate-capture := nqp::dispreplacearg($capture, 0, $do);
nqp::dispdelegate('boot-code', $delegate-capture)
}
else {
# Need to do CALL-ME on it.
my $delegate-capture = nqp::dispinsertstr($capture, 0, 'CALL-ME');
nqp::dispdelegate('raku-method-call', $delegate-capture)
}
}
);
The CALL-ME
case has potential to be far more efficient than is possible
today. In principle, we'll be able to inline the CALL-ME
method, if it's
small enough. (Also, this should let us eliminate the pesky "Cannot invoke
object with invocation handler in this context" problem.)
This is invoked with a capture where the method name is the first argument, followed by decont'd invocant, followed by the normal method arguments.
nqp::dispregister('raku-method-call',
-> $capture {
# Obtain the method name and invocant. Guard against the literal name,
# and against the type (but not concreteness) of the method. In the
# case we were given an argconst the first will be a no-op.
my str $name := nqp::captureposarg_s($capture, 0);
nqp::dispguardstr($name);
my $invocant := nqp::capturepos($capture, 1);
nqp::dispguardtype($invocant);
# Use the MOP to resolve the method.
my $meth = $invocant.HOW.find_method($invocant, $name);
unless nqp::isconcrete($meth) {
# Really would obtain/throw the typed exception here.
nqp::die("No such method '$name'");
}
# The dispatch state for resumption is the dispatch capture, which
# holds the name and arguments.
nqp::dispstate($capture);
# Before delegating onrward, we drop the name argument and replace
# the invocant argument with the target od hte dispatch.
my $delegate-capture := nqp::dispreplacearg(
nqp::dispdroparg($capture, 0), 0, $meth);
# Is this a multi-method?
if nqp::istype($meth, Method) && $meth.is_dispatcher {
# Yes, so delegate to multiple dispatch. We need to tack on the
# resolved thing to invoke.
nqp::dispdelegate('raku-multi-dispatch', $delegate-capture)
}
else {
# No, so produce an invocation result, containing the method to
# invoke and the argument capture sans name.
nqp::dispdelegate('raku-invoke', $delegate-capture)
}
},
'resume', -> $state, $args {
# Discussed later
}
));
This is standard invocation in Raku. It may be to a sub or multi-sub, but it's possible it'll also be to a method. This matters because deferral is meant to work even if it is not invoked with method syntax. It is invoked with a capture where the first thing is the sub to call, and the rest are the arguments. This is not resumable.
nqp::dispregister('raku-call', -> $capture {
# Guard on the type of what we're invoking.
my $target := nqp::captureposarg($capture, 0);
nqp::guarddisptype($target);
# It could be a method call in disguise. We need to send this through the
# indirect method dispatcher, so that deferral works.
if nqp::istype($target, Method) {
nqp::dispdelegate('raku-indirect-method-call', $target);
}
# Otherwise, it might be a multiple dispatch.
elsif nqp::istype($target, Routine) && $target.is_dispatcher {
nqp::dispguardconcreteness(nqp::dispgetattr($target, Routine, '@!dispatchees'));
nqp::dispdelegate('raku-multi-dispatch', $target);
}
# Otherwise, it's a straight invocation of what we have.
else {
nqp::dispdelegate('raku-invoke', $target)
}
});
TODO
TODO
The following kinds of dispatch involve resumption:
- Method dispatch (continuing to walk the MRO)
- Multiple dispatch (continuing to walk the candidates)
- Wrap dispatch (continuing to the next wrapper)
Furthermore, these fall back on their enclosing dispatcher in some cases (but not all). For example, a call to a wrapped multi method will need to exhaust the wrap dispatcher, then the multi dispatcher, then the method dispatcher.
Resuming a dispatch is relatively uncommon, and we'd like the overhead of this feature to be as low as possible in the case it is not called upon. At the same time, the cost of deferral today is very high. For example:
class P { method m() { } }
class C is P { method m() { callsame() } }
for ^1_000_000 {
C.m
}
Runs in 4.36s, while if we explicitly name the parent class (this form is optimized through a spesh plugin):
class P { method m() { } }
class C is P { method m() { self.P::m } }
for ^1_000_000 {
C.m
}
Then it runs in 0.066s, presumably thanks to full inlining by spesh. That's
not just a little faster, it's a factor of more than 65x faster! Of course,
this assumes the method does nothing, so in a realistic application it'd most
likely be amortized somewhat. But still, when modules like OO::Monitors
use
wrap
and callsame
, there's some motivation to give it a decent speedup.
Making the problem harder still is the fact that:
- We can either retain the current args or specify new args - but in the case we specify new args, and it's a method dispatch, we need to include the original invocant too.
- One can interact with the dispatch process by calling
lastcall
. - One can obtain the next dispatch target using
nextcallee
. - Things like
callsame
are just subroutines that you call.OO::Monitors
actually uses this fact to work around a pre-comp bug by storing theCORE.setting
&callsame
away in its local lexpad, in order that it doesn't get lost. Even without that fun, this is still a source of trouble, in that any callsite inside ofcallsame
would be megamorphic even though the actual dispatch being resumed is likley to be far less morphic.
A dispatcher that is resumable can store dispatch state. Dispatch state
is actually an argument capture derived from that supplied during the
initial dispatch. In the case the specializer can prove that this state
shall not be needed, it can completely skip the work to create it. Otherwise,
it knows it must at least keep enough information around to construct it
on demand. (Hopefully this will give us a lot more freedom than we have
today, where we must mark anything using callsame
and friends as not
possible to inline. In principle, this new dispatch mechanism will allow
us to lift this limitation.)
When we wish to resume a dispatch, we provide an argument. The proposed use of this for Raku's dispatchers will be:
- An argument capture when we are doing
callwith
ornextwith
- Sentinel objects for each of
callsame
/nextsame
,nextcallee
, andlastcall
Resuming a dispatch for the first time at a given callsite will:
- Locate the dispatcher that we shall resume.
- Invoke its
resume
callback with the current dispatch state and the argument. - Expect it to update the dispatch state and delegate to a dispatcher (ultimately bottoming out in a VM one).
The resume callback, much like the dispatch
one, can guard against its
incoming arguments: of note, the dispatch state and the resume argument.
Furthermore, it can set an updated dispatch state. As with dispatch
,
these are all considered as guard operations (including the formation of
the new dispatch state), which means that when the guards are met, the
resume
callback need not be run. (We assume there is a mechanism to ask
for the state to be stored against the caller's callsite, not our current
one, to avoid the megamorphism inside of callsame
and friends.)
When we defer, the set of candidates we shall iterate through is already preordained based on the initial argument capture. This means we can:
- Find the candidate list we'll work our way through.
- Add guards expressing what it depends on.
- Prepend those to the dispatch state capture, except the one we will be now most immediately calling.
- And before that, a sentinel object that indicates we already decided on the candidates, and perhaps another that separates them from the actual arguments.
Let's take the slow case mentioned earlier as an example.
class P { method m() { } }
class C is P { method m() { callsame() } }
for ^1_000_000 {
C.m
}
The first time around, there's no guard set. The C.m
call triggers the
dispatch callback. This finds the method, and stashes away the original
argument capture as dispatch state, which is prepended with the name. We
run C.m
. It calls callsame
. That triggers a resumption, which is
passed the dispatch state and the sentinel indicating we're doing a
callsame
. Again, the guard set at this callsite (assuming we manage to
attach it one-up, at the site of the callsame
call) is empty, so we
run the resume
callback. It finds all of the m
s in MRO. In this case
there's only one, so it just replaces the method name in the dispatch
state with the "end of candidates" sentinel (effectively, the remaining
candidates are the empty list). It establishes a guard against the type
of the invocant in the dispatch state and the callsame
sentinel to make
sure we're safe, and then delegates the dispatch so that we invoke the
P.m
.
The second time around the loop is rather easier. Since m
is a constant,
the only thing we need to do in the guard set to choose a candidate is to
check the type of the invocant is C
. It is. The guard tree then tells us
to fetch the $!code
attribute of m
. We do so. (Or we might have just
cached that constant in the guard tree.) Either way, we end up in m
,
without having to run dispatch
. We see the callsame
. We do actually
have to call it, but its resume call then looks at the guards established
last time around and winds up following a path that invokes P.m
. Thus
we don't call the resume
callback again in this case. We also didn't have
to allocate a dispatcher object as we do today. It remains to be seen to what
degree we can avoid allocations of the various derived captures.
We proceed this way for some further iterations. Meanwhile, at some point, the
code is considered hot and the specializer gets to work. We first end up in
C.m
. We inline callsame
, and find the guard set that goes with the resume.
We walk it, which leads us to C.p
, which is tiny, so we inline it. Since we
can see callsame
passes the sentinel for doing a callsame, that guard goes
away. At this point, the rest pretty much stays as it is. However, we can
compile the checking guards into the body of C.m
, meaning that we don't have
to interpret them, and they can instead be JITted.
Later, we inline the optimized C.m
into the body of the loop. Here the party
really gets started, because we can now see precisely what dispatch state is
set, and we know the types of the values that go into it. Given the simple
control flow, this means that we know what it will be upon resumption too, and
thus we can prove and delete all of the resumption guards. That means we'll
have optimized this poor naive benchmark into an empty loop body - much like we
could with the self.P::m
case, and thus it should run rather fast.
Roughly:
- When a resume is invoked and it has run out of candidates, it can delegate to
boot-next-dispatcher
. Not all need do this; a wrap dispatcher probably will as it could be in a method dispatcher or multi dispatcher, but a method dispatcher would have nowhere to go. - Dispatchers can, when setting up a delegation to another dispatcher, indicate if that dispatcher should fall back to this one or not. Thus when a method dispatcher delegates to a multi dispatcher, it can say "if you fallback, it is to me". It remains to be seen precisely how well we can get wrap dispatchers into this picture too without too much repetition.
- Implement enough that we can migrate spesh plugins. Migrate them. Maybe
at this part-way point we can even merge; on the other hand, we may be
polishing something that isn't exercised on all the use-cases.
- Get the new dispatch instruction added and handled by validation. Add NQP code to generate it. For now, it just gives an NYI panic. (DONE)
- Implement the new constant value flag in callsites. (DONE)
- Implement cache storage. A quick, cheap way for bytecode instructions that want to cache stuff against an instruction to do so. We'll use it for dispatch but also for getlexstatic_o (and will spectest it against this). May find further use cases in the future. (Current idea: keep a pointer array in parallel with the bytecode, work out the density we need (to a power of two) and use the instruction offset to decide on that. (DONE)
- Refactor getlexstatic_o optimization to look at the inline cache rather than needing a value log entry. To prove spesh can get at it. (DONE)
- Detailed design of how we'll handle argument captures going forward. (DONE)
- Implement dispatcher registry. (DONE)
- Refactor call stack so that it can hold more than just frames, and so it always hangs on to even promoted frames. (DONE, modulo a deopt issue being uncovered)
- Implement enough to be able to produce a capture object for the dispatch callback and invoke a dispatcher with it. (DONE)
- Basic design for the dispatch program execution enough for
boot-value
,boot-constant
, andboot-code
(the terminals), along with some simple guards. (DONE) - Implement
boot-value
being invoked and recording a dispatch program, but not actually storing it yet. (DONE) - Implement
boot-value
actually running the dispatch program, so as not to run the dispatch callback. (DONE) - Implement the
boot-constant
dispatcher. (DONE) - Implement the
boot-code-constant
dispatcher. (DONE) - Implement the VM "syscall" dispatch mechanism. (DONE)
- Implement registering a userspace dispatcher and calling it (but it can't do anything useful yet). (DONE)
- Add support for delegating to another dispatcher for userland dispatchers. (DONE)
- Add support for setting guards in userland dispatchers. (DONE)
- Add support for capture transformations in userland dispatchers. (DONE)
- Try switching some spesh plugins. So far, should only work with spesh disabled. (DONE)
- Properly handle non-internable callsites in dispatch programs, and revise rules for what we can and cannot intern. (DONE)
- Implement flattening prior to dispatch, selecting a fitting callsite. (DONE)
- Make sure that, with the exception of things depending on multiple dispatch, the things migrated from spesh plugins to the dispatcher work out. (DONE)
- Eliminate all spesh plugins in Raku. (DONE)
- Eliminate spesh plugins in MoarVM, and their NQP tests. (DONE)
- Teach spesh about dispatch and get it to do the rewrite of a dispatch instruction into guards + linking or inlining.
- Switch Rakudo method, multiple, and wrap dispatch to use the new scheme.
Update
callsame
and friends to use the new model too. These do need resumption. Make sure we handle all the dispatcher nesting cases. - Switch NQP method and multiple dispatch to use the new scheme. These don't need resumption.
- Elimiante last remaining remenants of findmethod and other method caching
infrastructure;
nqp::can
goes along with it. - Re-tune things that need it, exploit new optimization opportunities.
- Figure out how to un-break other backends, to the degree they got broken.
- Figure out how to replace the spesh arg guards with this too, if that was not already done.
MVM_disp_registry_*
(src/disp/registry.c
) - the dispatcher registry, which is populated with the boot (VM-provided) ones at startup.MVM_disp_boot_*
(src/disp/boot.c
) - implementations of the bootstrap dispatchers that are provided by the VMMVM_disp_record_*
(src/disp/record.c
) - recording of new dispatch chain entries, including capture transforms and guard operationsMVM_disp_tree_*
(src/disp/tree.c
) - implementation of the dispatch tree for faster lookups, both building the tree and walking it
- Store the list of guard ops as the primary data. For each, we have the set of operations, followed by a final result (constant, run C function, set up a frame and invoke bytecode). This bit is along the lines of how spesh plugins do it, except we plan to de-dupe them too, and are involved with captures. But...
- Steal the spesh arg guard tree approach, where we use a tree structure to lift common prefixes. Just rebuild the tree when we get new guard ops, again like we do in the spesh arg guard tree (it works based on us always having the type tuples there).
- The guard tree produces an integer result, which we use to look up the final action, but also in unspecialized code to record where we ended up going, for the purposes of spesh (again, much like spesh plugins do).
- Store them off the
spesh
data structure on a static frame. Maybe we will preallocate that now in any frame that contains adispatch
instruction (or an invoke, which devolves to that), to avoid the check every time. - The same binary search approach for bytecode addresses can also work.
The setup phase of a dispatch deals in argument captures, and transformations on those. It will receive an actual Capture object. However, typically, we'll evaluate the "dispatch program" without forming such an object, often ending up invoking code that then receives arguments as parameters.
MoarVM to date uses an arguments buffer that is a designated part of the work area of a call frame. Argument instructions copy from registers into that area. In the interpreter, that means one instruction to dispatch per argument we pass, plus we need to take care of GC-walking that area. On the parameter handling side, we then form an argument processing context. When dealing with captures of arguments, or flattening arguments, those can also be referred to by the argument processing area. Flattening arguments need us to allocate an argument buffer too.
The dispatch
instruction is different: it lists the registers that hold the
arguments. This means no argument buffer to allocate space for nor to mark in
the GC. It also means no instructions to interpret per argument on the callee
side, and that there's no (immediate) need to copy the arguments anywhere.
Ideally, we'd like to achieve:
- No copying of arguments other than to their final destination in the common case. Granted we may need to read them while interpreting guards. But we'd like to, when possible, avoid having an argument buffer that we temporarily copy all of the args into. Of course, in some situations, that won't happen: flattening and when we need to go via. the dispatch setup phase and so have the arguments in a capture object.
- Argument access in a callee being polymorphic over "direct from the
dispatch
instruction" or "via. flattening or a capture". Even after specialized, we'd ideally retain the polymorphism here. - That polymorphism is cheap (e.g. free of branch instructions).
This can be achieved by modeling arguments to bind as:
- A callsite object (which we don't care about in the specialized case)
- An array
args
ofMVMRegister
which holds arguments - An array
map
ofMVMuint16
which tells us which element in the argument array to look at for a particular argument
In the case of the arguments coming directly from the dispatch
instruction,
args
is the work
area of the caller, and map
is the set of mappings in
the dispatch
bytecode instruction. (Note that it will probably be a couple
of positions into that, since the first argument(s) are normally aimed only at
the dispatcher, not at the final callee.) In the case of the JIT, this can live
in the data segment.
But in the case we already expanded the arguments into a buffer (because we
formed a capture object or because of flattening), and so are not reading from
the work area but an already organized set of arguments, what should map
be?
It turns out that it can be the identity array, where element 0 is 0
, element
1 is 1
, etc. We can have one of these globally, perhaps at some nominal size
and safepoint-reallocated with a bigger one if the program happens to do huge
flattenings.
This means that argument access to argument n
for parameter binding is
args[map[n]]
, regardless of the source. This provides the desired bit of
polymorphism without a branch, should JIT very easily too, and the spesh/JIT
is valid whether we call it via. flattening or not.
There are two lifetimes to consider for arguments:
- The arguments being passed are consumed only by parameter instructions. In this case, we know their consumption is bounded by the lifetime of the callee's execution. Thus, we can keep any intemediate state (for example, of expanded flattening arguemnts) on the call stack.
- We need a capture object. In this case, we have to assume the arguments will escape, and so should form a capture object.
When we need to reify a capture as an object, it contains:
- The callsite
- An argument buffer, allocated in the FSA
Any flattening arguments will have already been expanded. This will be implemented as a new representation, rather than brining the baggage of argument processing as found in the current CallCapture representation (which will go away after these changes). In the interim, ops on a CallCapture will also work on a new Capture.
At the point we perform a dispatch, we have:
- A callsite
- A list of registers in the calling frame containing arguments
The common case is that the dispatch program (that is, the set of guards
and transforms set up by the dispatch
callback) will find a resolution
without having to call the dispatch
callback. While the dispatch
call
will produce intermediate capture objects as it goes about its work, we
wish to elide all of that work when evaluating the resulting dispatch
program. We'd also like to do as little copying around of arguments as
possible; if we end up making a call, ideally the destination of the call
will will just be pulling its arguments out of the work area of its caller
via. the mappings in the dispatch
instruction.
Due to the way dispatchers work, capture transformations will center almost almost entirely around the start of the argument list (or at least, it makes sense to put arguments aimed just at the dispatcher first). Thus, we can consider the argument list as being "split" into two parts:
- A first part that exists as actual values for the dispatch program to use
- A second part that exists as indexes into work registers
The dispatch program in a given dispatch will be analyzed to work out the
first arg left untouched by the dispatch process, and that will be the
split point for the entire dispatch. It will then allocate itself the
required amount of space in its work area on the callstack for doing the
argument manipulations before that point. If the dispatch results in an
actual call to some frame, then assuming the arguments that are passed only
consist of a tail of those in the dispatch
instruction, then a pointer
into the dispatch
instruction's mapping list will be used as the map
for
use in parameter binding.
We know that:
- A resumable dispatcher needs storage of its dispatch state - at least, in principle. Most efficient is to do just enough to go back and recover it later. Dispatchers without state don't need to persist anything.
- Even more fun is that one
dispatch
instruction can produce a stack of any number of dispatches before we actually reach the point of running code, so there's no direct mapping between them and frames. - Their lifetime is until we are done with the dispatch.
- Need to make sure we "take" them in continuations too.
Calling is generally costly in MoarVM, and MVMFrame
is quite heavyweight.
It includes things we have to preserve if we're an outer scope, as well as
things that we only care about when executing. Thus, there may be some value
to separate "callstack" from current notion of "frame". Callstack becomes a
region of memory containing the things we need as we run. A callstack entry
may reference a "full-blown" frame, but isn't automatically one.
The callstack would have entries that are:
- Frames (including their work space, so we can do a single allocation here). Specialized and unspecialized frames may have different layouts. This will be more natural once spesh is folded into the dispatch chain mechanism.
- Dispatchers (which may also have work space).
- Continuation tags.
- Special return handlers.
For now, we'd probably stick with frames otherwise living a life much like they do today. A bit further into the future, however, it'd be good to give that a revisit, so we can have less Raku-specific knowledge in the VM and handle some things in much more elegant ways. This would also be a key step towards the re-focus on interpreter cooperation that has been speculated.
Pretty cool so far! Some comments / questions:
re "does not need a containerization operation". Shouldn't that be a "decontainerization operation?
What's a
MVMFunction
?Spellos: "
dispatch_v str'" -> "
dispatch_v str`", "Contorl: -> "Control", "takedisaptcher" -> "takedispatcher"I don't like the "disp" prefix in e.g.
nqp::dispregister
, because it reminds me too much of "display" I guess. Perhaps "dx" would be a better prefix? It's definitely shorter :-) And maybe for readability add a hyphen or underscore to the prefix, so that it would becomenqp::dx-register
.How are you going to make sure "nqp::dispdelegate" is only called exactly once? Wouldn't that need a level of extra checking?
I'm unclear as to how "nqp::dispguardtype($obj);" would know it's the correct object type?
"it can completely the work to create it" I guess you meant to say "it can completely skip the work to create it" ?