jnthn/x.md Secret

## x.md

      
    Raw
  

              x.md
            
          
    A whole heap of work

The majority of my Perl 6 grant time in the last week went on
building a heap snapshot mechanism in MoarVM. But, before I dug
into that, I worked on...
More memory savings!

While I expect we'll find plenty of interesting memory savings -
not to mention the source of the EVAL memory leak - through the
heap snapshots I'm working on, there's still a good bit of info
to be had from C-level heap profiling tools and taking a careful
look at various data structures. This week, I found and improved
3 such things.
Every block, routine, or even thunk that we compile gets an ID
that is unique within its compilation unit (that is, a body of
code that is compiled at once). This is used to be able to refer
to it unambiguously. Historically - dating all the way back to
the Parrot days - these needed to be a bit more unique than just
within a compilation unit. This made them fairly long, and since
something like the Perl 6 builtins have into the tens of
thousands of them, they added up quickly. So, I
shortened them
a good bit, knocking 188KB of NQP's base memory, 1424KB off
Rakudo's base memory, and making the compiled CORE.setting a
nice 288KB smaller.
These routines, blocks, and thunks are all represented down in
MoarVM as static frames - a data structure I've talked about a
bit in previous posts here. Aware that we have a lot of them, I
looked to see if there were any more relatively easy savings to
be made. I discovered a couple.
First up, we were keeping around a small map of the bytecode that
was used to indicate instruction boundaries. This was used for
precisely one thing: outputting instruction offsets when showing
low-level backtraces. We never actually show that data to any
normal Perl 6 end user. It was useful to have around early on in
MoarVM's development, but I can't remember the last time I needed
that information to debug something. It's also hugely misleading
these days, as the VM is often running specialized bytecode with
lots of virtualization and indirection stripped out. So, I
just got rid of it.
We only generated these for the bytecode of static frames that
were actually run, but it also saved a pointer for all static
frames. In total, it was another 316KB off NQP's base memory, and 1088KB
off Rakudo's.
Finally, back when I added the profiler, static frames got a few
new fields. They're totally unused when we're not applying any
kind of instrumentation, however. So, I pulled them out
into a single struct, reducing us to just one pointer per static
frame for this state. That was another 144KB off NQP, and 1036KB
off Rakudo.
These total to around 3.5MB off the base size of Rakudo (measured
by running the empty loop program), and all give further savings
once extra modules are loaded or more of CORE.setting is used.
Multi caches get a size boost

I analyzed some code from RabidGravy doing sine wave generation,
and noticed it was doing a lot of slow-path multi-dispatch. It
turns out we were overflowing the multi caches, which in turn
would be harming our ability to inline. While this points at a
design issue I want to look into a little further, giving the
caches a size boost
provided some instant relief, getting the code down to 60% of
its previous runtime. This does mean we use a bit more memory,
but only for multis that we actually call, which isn't all that
many just for a normal startup.
Heap snapshots

This week I got a lot of work in on heap snapshots. First of all,
what is a heap snapshot? It's a representation of the memory
allocated by a program, and under the management of the VM. In
MoarVM this consists of three things:

Objects (both type objects and instances)
Shared Tables (aka "STables"), which hold the common things
for a given meta-object and representation pairing
Call frames

The first two are managed by the GC; in fact, most code in the
GC doesn't care which it's dealing with. The third is managed
for now by reference counts (the reasons why, and how this gives
a mix of great properties and awful properties, is a discussion
for another day!)
The point isn't to capture all of the data on the heap, but rather to capture
the size, identity, and reference structure of the data. Size is
easy enough to understand - it's just about how many bytes are
used. Identity is about what type an object has (the class name,
for example), what type an STable represents, and what block or
routine in the code a particular call frame is an invocation of.
Together with counting how many items we have, we can use
size and idendity data to answer questions like, "what kinds of
objects are using up loads of memory". I'll be intersted to use
this to see what's eating up memory in NQP and Rakudo, before you
even start doing any interesting work.
Reference structure is more interesting. It's about what things
point to, or refer to, other things. MoarVM has a reachability
based GC. When we garbage collect, we start from a set of roots,
which consist of things like the call stacks of each running
thread along with other per-thread and per-VM state. We then -
conceptually, at least - "mark" all the objects that these roots
reference and put them on a todo list. We then visit them, asking
each of them what they reference, and putting those objects onto
the todo list. By taking care never to put an already marked object
onto the todo list, we can be confident we'll eventually terminate
with an empty list. At the end, all of the unmarked objects are
declared unreachable, and we can free them up. (And just to
drive it home: this is a good way to conceptually understand
what a reachability GC does, but does not reflect the way
modern VMs that need to efficiently manage a large number of
objects actually do it. And, like nearly all abstractions, it
hides details that can matter in reality. Again, a topic for
another day. :-))
Anyway, the reference structure is all about answering why a
given object is still in memory. What path from the roots is
keeping it alive? This is the sort of data that will help me
understand the leaky EVAL bug I mentioned a week or so back.
Taking a heap snapshot involves capturing all of the raw data.
It's rather important that the process of taking a heap snapshot
does not impact the liveness of the things in the snapshot. The
instrumenting profiler, which was primarily built for giving us
insight into microbenchmarks and how the dynamic optimizer
works with them, references real type objects and
static frames in its data set, which is then post-processed. In
doing so, it keeps them alive. But the heap snapshotter
doing this would be a disaster! It would distort the set of
live objects, which it exists to measure. So, rule 1 is that
the heap snapshot data structure must never hold any references
to GC-managed objects. In fact, that's probably the only really
important rule. Of course, it's going to affect the amount of
memory the VM uses, but that must not show up in any way in the
results. Most of the remaining challenge in taking heap snapshots
is about trying to capture the data in a sufficiently compact way
that it's just large rather than utterly huge.
Another interesting question is when to take heap snapshots. A
fairly good time is just after a GC run. Why? Because we're in a
state of "stop the world" at that point anyway, so we really are
taking a snapshot of an atomic point in time with regard to the
execution of the program. So, that's where it's done. (Note that
this doesn't bias things, in so far as we'd never be wanting to
include unreachable things into the snapshot anyway.)
Anyway, by this point I've got pretty far with the code that
does the snapshotting. It happily finds all object and STable
references, and I'm part of the way through frames, which will
mean it can snapshot the memory associated with closures
properly. So, not much more work to go on producing the
snapshots themselves.
Once I've got complete snapshots being produced, the next task
will be to build some tools to analyze them. There will also be
some work to make the snapshots a bit more detailed (for example,
including the names of object attributes, so it's easier to see
the path keeping an object alive). I'll probably take on some
basic analysis tooling first, however, because I think even
without that added context the snapshots will still be hugely
revealing. Stay tuned for next week's report for more!