The changes to Rakudo (dynamic recompilation) and MoarVM (native code generation) are independant.
To get the most out of the system, the optimizer needs to do type inference and constant propagation.
The dispatcher needs to be changed to keep track of types and values of arguments at a given callsite. If a particular signature turns hot, a new multi with these arguments (either a type specialization or even a constant value) needs to be created and installed, which should happen in a separate thread so execution can keep going.
The opcode set needs to be re-organized so we can easily distinguish
- basic, easily jitable ops
- complex ops implemented via C functions
- control flow or otherwise special ops
The bytecode generator needs to emit a marker op for the start of basic blocks (composed of basic or complex ops) that acts as a JIT hook for blocks that want to be compiled to native code.
When the interpreter hits a JIT hook (and native code generation is supported on that architecture), it calls the JIT compiler to generate native code starting at the marker. A pointer to the native code gets added (atomically!) to a table of compiled blocks indexed by the block ID (which is the argument of the marker op).
- progressive enhancement of the existing system
- no AOT or heavy warmup stage
- easily(?) portable to other architectures
- control flow handled by the interpreter, which will mess with CPU magic (branch prediction, pipelines)
- Rakudo/MoarVM separation prevents more holistic optimizations
- lack of specialized and op-level optimizations and no optimizations to control flow at all
A separate, more heavily optimizing method JIT that includes all bells and whistles. In contrast to dynamic recompilation, the method JIT gets triggered callee-, not caller-side if a particular multi turns hot. As before, optimizations should happen in a separate thread.
The list of specializations of a given sub should probably be stored caller-side along the native code generated by the method JIT.