Skip to content

Instantly share code, notes, and snippets.

@gerdr
Last active December 21, 2015 09:08
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save gerdr/68fbfbab5baf55dfd98a to your computer and use it in GitHub Desktop.
Save gerdr/68fbfbab5baf55dfd98a to your computer and use it in GitHub Desktop.
Rakudo/MoarVM JIT compiler design proposal

Rakudo/MoarVM JIT compiler design proposal

The changes to Rakudo (dynamic recompilation) and MoarVM (native code generation) are independant.

Rakudo

To get the most out of the system, the optimizer needs to do type inference and constant propagation.

The dispatcher needs to be changed to keep track of types and values of arguments at a given callsite. If a particular signature turns hot, a new multi with these arguments (either a type specialization or even a constant value) needs to be created and installed, which should happen in a separate thread so execution can keep going.

MoarVM

The opcode set needs to be re-organized so we can easily distinguish

  • basic, easily jitable ops
  • complex ops implemented via C functions
  • control flow or otherwise special ops

The bytecode generator needs to emit a marker op for the start of basic blocks (composed of basic or complex ops) that acts as a JIT hook for blocks that want to be compiled to native code.

When the interpreter hits a JIT hook (and native code generation is supported on that architecture), it calls the JIT compiler to generate native code starting at the marker. A pointer to the native code gets added (atomically!) to a table of compiled blocks indexed by the block ID (which is the argument of the marker op).

Pro

  • progressive enhancement of the existing system
  • no AOT or heavy warmup stage
  • easily(?) portable to other architectures

Con

  • control flow handled by the interpreter, which will mess with CPU magic (branch prediction, pipelines)
  • Rakudo/MoarVM separation prevents more holistic optimizations
  • lack of specialized and op-level optimizations and no optimizations to control flow at all

Solution

A separate, more heavily optimizing method JIT that includes all bells and whistles. In contrast to dynamic recompilation, the method JIT gets triggered callee-, not caller-side if a particular multi turns hot. As before, optimizations should happen in a separate thread.

@gerdr
Copy link
Author

gerdr commented Aug 22, 2013

The list of specializations of a given sub should probably be stored caller-side along the native code generated by the method JIT.

@gerdr
Copy link
Author

gerdr commented Aug 22, 2013

I believe REPR polymorphism could be handled the same way. A sub

sub foo(Foo f, Bar b) { ... }

would end up with a REPR-generic specialization and a specialization for the default representations of both Foo and Bar.

The REPR guards end up not in the callee-side low-level code, but unified with the caller-side high-level multi dispatch.

@gerdr
Copy link
Author

gerdr commented Aug 24, 2013

I believe it would even be reasonable to disallow mixing basic and complex ops in a single native block. This would greatly simplify code generation: A basic block could just map and unmap VM registers to native ones at its beginning (and end, respectively - overflow needs handling, of course). A complex block is just a sequence of call instructions.

Before any decisions are made, we'd have to actually start classifying the ops and take a look at some real-world bytecode so see if there are sufficiently many homogeneous op sequences.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment