jnthn/x.md Secret

## x.md

      
    Raw
  

              x.md
            
          
    Redesigning Rakudo's Scalar

What's the most common type your Perl 6 code uses? I'll bet you that in most
programs you write, it'll be Scalar. That might come as a surprise, because
you pretty much never write Scalar in your code. But in:
my $a = 41;
my $b = $a + 1;

Then both $a and $b point to Scalar containers. These in turn hold the
Int objects. Contrast it with:
my $a := 42;
my $b := $a + 1;

Where there are no Scalar containers. Assignment in Perl 6 is an operation
on a container. Exactly what it does depending on the type of the container.
With an Array, for example, it iterates the data source being assigned, and
stores each value into the target Array. Assignment is therefore a copying
operation, unlike binding which is a referencing operation. Making assignment
the shorter thing to type makes it more attractive, and having the more
attractive thing decrease the risk of action at a distance is generally a good
thing.
Having Scalar be first-class is used in a number of features:

Lazy vivification, so if %a{$x} { ... } will not initialize the hash slot
in question, but %a{$x} = 42 will do so (this also works many levels deep)
The is rw trait on parameters being able to work together with late-bound
dispatch
Making l-value routines possible, including every is rw accessor
List assignment
Using meta-ops on assignment, for example Z=

And probably some more that I forgot. It's powerful. It's also utter torture
for those of us building Perl 6 implementations and trying to make them run
fast. The frustration isn't so much the immediate cost of the allocating all
of those Scalar objects - that of course costs something, but modern GC
algorithms can throw away short-lived objects pretty fast - but also because
of the difficulties it introduces for program analysis.
Despite all the nice SSA-based analysis we do, tracking the contents of Scalar
containers is currently beyond that. Rather than any kind of reasoning to prove
properties about what a Scalar holds, we instead handle it through statistics,
guards, and deoptimization at the point that we fetch a value from a Scalar.
This still lets us do quite a lot, but it's certainly not ideal. Guards are
cheap, but not free.
Looking ahead

Over the course of my current grant from The Perl Foundation, I've been working
out a roadmap for doing better with optimization in the presence of Scalar
containers. Their presence is one of the major differences between full Perl 6
and the restricted NQP (Not Quite Perl), and in many programs play a big part in
the performance difference between the two.
I've taken the first big step towards improving this situation by significantly
re-working the way Scalar containers are handled. I'll talk about that in this
post, but first I'd like to provide an idea of the overall direction.
In the early days of MoarVM, when we didn't have specialization or compilation
to machine code, it made sense to do various bits of special-casing of Scalar.
As part of that, we wrote code handling common operations involving them in C.
We've by now reached a point where what used to be a nice win is preventing us
from performing the analyses we need in order to do better optimizations. At
the end of the day, a Scalar container is just a normal object with an
attribute $!value that holds its value. Making all operations dealing with
Scalar container really be nothing more than some attribute lookups and
binds would allow us to solve the problem in terms of more general analyses,
which stand to benefit many other cases where programs use short-lived objects.
The signficant new piece of analysis we'll want to do is escape analysis, which
tells us which objects have a lifetime bounded to the current routine. We
understand "current routine" to incorporate those that we have inlined.
If we know that an object's usage lies entirely within the current routine, we
can then perform an optimization known as scalar replacement, which funnily
enough has nothing much to do with Scalar in the Perl 6 sense, even if it
solves the problems we're aiming to solve with Scalar! The idea is that we
allocate a local variable inside of the current frame for each attribute of
the object. This means that we can then analyze them like we analyze other
local variables, subject them to SSA, and so forth. This for one gets rid of
the allocation of the object, but also lets us replace attribute lookups and
binds with a level of indirection less. It will also let us reason about the
contents of the once-attributes, so that we can eliminate guards that we
previously inserted because we only had statistics, not proofs.
So, that's the direction of travel, but first, Scalar and various operations
around it needed to change.
Data structure redesign

Prior to my recent work, a Scalar looked something like:
class Scalar {
    has $!value;        # The value in the Scalar
    has $!descriptor;   # rw-ness, type constraint, name
    has $!whence;       # Auto-vivification closure
}

The $!descriptor held the static information about the Scalar container,
so we didn't have to hold it in every Scalar (we usually have many instances
of the same "variable" over a programs lifetime).
The $!whence was used when we wanted to do some kind of auto-vivification.
The closure attached to it was invoked when the Scalar was assigned to, and
then cleared afterwards. In an array, for example, the callback would bind the
Scalar into the array storage, so that element - if assigned to - would start
to exist in the array. There are various other forms of auto-vivification, but
they all work in roughly the same way.
This works, but closures aren't so easy for the optimizer to deal with (in
short, a closure has to have an outer frame to point to, and so we can't inline
a frame that takes a closure). Probably some day we'll find a clever solution
to that, but since this is an internal mechanism, we may as well make it one
that we can see a path to making efficient.
So, I set about considering alternatives. I realized that I wanted to replace
the $!whence closure with some kind of object. Different types of object
would do different kinds of vivification. This would work very well with the
new spesh plugin mechanism, where we can build up a set of guards on objects.
It also will work very well when we get escape analysis in place, since we can
then potentially remove those guards after performing scalar replacement. Thus
after inlining, we might be able to remove the "what kind of vivification does
this assignment cause" checking from quite a few different cases.
So this seemed workable, but then I also realized that it would be possible to
make Scalar smaller by:

Placing the new auto-vivification objects in the $!descriptor slot instead
Having the vivification object point to the original descriptor carrying the
name, type, etc.
Upon first assignment, running the vivification logic and then replacing the
Scalar's $!descriptor with the simple one carrying the name and value

This not only makes Scalar smaller, but it means that we can use a single
guard check to indicate the course of action we should take with the container:
a normal assignment, or a vivification.
The net result: vivification closures go away giving more possibility to inline,
assignment gets easier to specialize, and we get a memory saving on every
Scalar container in the program. Nice!
C you later

For this to be really worth it from an optimization perspective, I needed to
eliminate various bits of C special-case code around Scalar and replace it
with standard MoarVM ops. This implicated:

Assignment
Atomic compare and swap
Atomic store
Handling of return values, including decontainerization
Creation of new Scalar containers from a given descriptor

The first 3 became calls to code registered to perform the operations, using
the 6model container API. The second two cases were handled by replacing the
calls to C extops with desugars, which is a mechanism that takes something
that is used as an nqp::op and rewrites it, as it is compiled, into a more
interesting AST, which is then in turn compiled. Happily, this meant I could
make all of the changes I needed to without having to go and do an refactor
across the CORE.setting. That was nice.
So, now those operations were compiled into bytecode operations instead of
ops that were really just calls to C code. Everything was far more explicit.
Good! Alas, the downside is that the code we generate gets larger in size.
Optimization with spesh plugins

I talked about specializer plugins
in a recent post, where I used them to greatly speed up various forms of
method dispatch. However, they are also applicable to optimizing various
operations on Scalar containers.
The change to decontainerizing return values was especially bad at making the
code larger, since it had to do quite a few checks. However, with a spesh
plugin, we could just emit a call to that. Most of the time, only one of the
cases or a small number of cases apply. Here's a slightly simplified version
of the the plugin, annotated with some comments about what it is doing. The
key thing to remember about a spesh plugin is that it is not doing an
operation, but rather it's setting up a set of conditions under which a
particular implementation of the operation apply, and then returning that
implementation.
nqp::speshreg('perl6', 'decontrv', sub ($rv) {
    # Guard against the type being returned; if it's a Scalar then that
    # is what we guard against here (nqp::what would normally look at
    # the type inside such a container; nqp::what_nd does not do that).
    nqp::speshguardtype($rv, nqp::what_nd($rv));

    # Check if it's an instance of a container.
    if nqp::isconcrete_nd($rv) && nqp::iscont($rv) {
        # Guard that it's concrete, so this plugin result only applies
        # for container instances, not the Scalar type object.
        nqp::speshguardconcrete($rv);

        # If it's a Scalar container then we can optimize further.
        if nqp::eqaddr(nqp::what_nd($rv), Scalar) {
            # Grab the descriptor.
            my $desc := nqp::speshguardgetattr($rv, Scalar, '$!descriptor');
            if nqp::isconcrete($desc) {
                # Has a descriptor, so `rw`. Guard on type of value. If it's
                # Iterable, re-containerize. If not, just decont.
                nqp::speshguardconcrete($desc);
                my $value := nqp::speshguardgetattr($rv, Scalar, '$!value');
                nqp::speshguardtype($value, nqp::what_nd($value));
                return nqp::istype($value, $Iterable) ?? &recont !! &decont;
            }
            else {
                # No descriptor, so it's already readonly. Return as is.
                nqp::speshguardtypeobj($desc);
                return &identity;
            }
        }

        # Otherwise, full decont.
        return &decontrv;
    }
    else {
        # No decontainerization to do, so just produce identity.
        return &identity;
    }
});

Where &identity is the identity function, &decont removes the value from
its container, &recont wraps the value in a new container (so an Iterable
in a Scalar stays as a single item), and &decontrv is the slow-path for
cases that we do not know how to optimize.
The same principle is also used for assignment, however there are more cases
to analyze there. They include:

When the type constraint is Mu, and there is a normal (non-vivify)
descriptor, then we do a specialization based on the value being the Nil
object (in which case we produce the operation that set $!value back to
the default value from the descriptor) or non-Nil (just assign a value,
with no need to type check)
When the type constraint is something else, and there is a normal (non-vivify)
descriptor, then we do a specialization based on the type of the descriptor
being assigned. Since the optimizer will often know this already, then we can
optimize out the type check
When it is an array auto-viv, we produce the exact sequence of binds needed
to effect the operation, again taking into account a Mu type constraint and
a type constraint that needs to be checked

Vivifying hash assignments are not yet optimized by the spesh plugin, but will
be in the near future.
The code selected by the plugin is then executed to perform the operation. In
most cases, there will only be a single specialization selected. In that case,
the optimizer will inline that specialization result, meaning that the code
after optimization is just doing the required set of binds needed to do the
assignment work.
Next steps

Most immediately, a change to such a foundational part of the the Rakudo Perl
6 implementation has had some fallout. I'm most of the way through dealing with
the feedback from toaster (which runs all the ecosystem module tests), being
left with a single issue directly related to this work to get to the bottom of.
Beyond that, I need to spend some time re-tuning array and hash access to better
work with these changes.
Then will come the step that this change was largely in aid of: implementing
escape analysis and scalar replacement, which for much Perl 6 code will hopefully
give a quite notable performance improvement.
This brings me to the end of my current 200 hours on my Perl 6 Performance and
Reliability Grant. Soon I will submit a report to The Perl Foundation, along
with an application to continue this work. So, all being well, there will be
more to share soon. In the meantime, I'm off to enjoy a week's much needed
vacation.