Skip to content

Instantly share code, notes, and snippets.

@Bike
Last active June 1, 2023 19:56
Show Gist options
  • Save Bike/50908307fe4db8bdc086ba6e65d1afbd to your computer and use it in GitHub Desktop.
Save Bike/50908307fe4db8bdc086ba6e65d1afbd to your computer and use it in GitHub Desktop.
bytecode IR changes

I've just completed the first draft of a system to compile CVM bytecode directly to Cleavir BIR. I think for the most part it's fine: it takes time linear in the length of the bytecode (unless I'm missing something) and produces respectable enough IR. But part of the code is ugly, and it would be uglier if I hadn't already made a few changes to the bytecode. To fix it further, I'm brainstorming ways the bytecode could be changed to make it more suitable as a compiler frontend without making it worse in other ways.

Changes to bytecode to make it more suitable for/as IR

PHI

The number one problem, and the predominant cause of ugliness in the system, is how to handle values returned from conditionals. Given (if x y z), the bytecode will simply push y to the stack or z to the stack in each branch, and then at the merge the following code will just use whatever is on top of the stack. But BIR wants these merged values to be inputs to PHIs. As far as I can think of, there is no way to determine how many values are being merged at a merge point except by comparing the entire stack state of each predecessor, which is what my system now does.

That is, for example, if we have (foo w (if x y z)), the bytecode is

  fdefinition 'FOO
  ref 0 ; w
  ref 1 ; x
  jump-if L0
  ref 3 ; z
  jump L1
L0:
  ref 2 ; y
L1:
  call 2

Arriving at L1 from the else block, the stack would contain (#'FOO w x z), while from the true block (L0) it would contain (#'FOO w x y), so the compiler can determine that only one phi is needed.

The situation is also more complicated by the fact some conditionals return multiple values, and the fact that a block could be reached by a jump and by NLX (and for NLX, the stack will be completely different and irrelevant).

Some solutions I've been thinking of:

  • Introduce variants of the jump and exit instructions that indicate how many values are involved (e.g. jump-receive-one, jump-receive-fixed...). The difference would be ignored by the VM but used by the BIR compiler. Doing this would complicate the bytecode compiler a little and pollute the opcode space.
  • Having the BIR compiler simply treat everything on the stack as needing a phi. This would make the produced BIR kind of stupid looking, but it would be easy to write a pass to eliminate extraneous phis (meta-evaluate does not yet do this, far as I can tell). Gets weird with NLX.
  • During compilation to bytecode, we could save metadata (exterior to the bytecode itself) about what values are input to a block. This would be a slight amount of work and therefore slow down compilation a bit, but shouldn't introduce any complicated passes.
  • Building off that, introduce an optional pass that looks through the existing bytecode and produces appropriate metadata to attach to the function. I think this is probably the right thing to do, because it would probably be advantageous to write a verification/annotation pass separate to the BIR compiler anyway (see "verification and annotation" below).

MV calls

BIR expects multiple value calls to be within a dynamic environment representing the temporary saving of the argument values to the stack. This is a bit tricky to produce from bytecode, because the mv call instructions instead use the mv vector as an argument. That is, for the code (multiple-value-call f (funcall g)) we have

  ref 0 ; f
  ref 1 ; g
  call 0
  mv-call

where the mv-call has to be translated into a values-collect instruction and then an mv-call instruction, whereas for (multiple-value-call f (funcall g) (funcall h)) we have

  ref 0 ; f
  ref 1 ; g
  call 0
  push-values
  ref 2 ; h
  call 0
  append-values
  pop-values
  mv-call

where push-values and append-values are sort of like values-save or values-collect instructions, but then pop-values is a fly in the ointment, and the mv-call needs to not make its own values-collect.

I realized I could change the bytecode in order to match what BIR does more closely, and in the process slightly improve VM performance. The idea is that instead of working off the mv register, mv-call gets its arguments from the stack, which must be topped by the variadic values pushed by push-values or append-values. That is, the two previous examples are instead compiled as

  ref 0 ; f
  ref 1 ; g
  call 0
  push-values
  mv-call

and

  ref 0 ; f
  ref 1 ; g
  call 0
  push-values
  ref 2 ; h
  call 0
  append-values
  mv-call

The runtime advantage is that this saves a copy operation. To do an mv-call, Clasp needs all the arguments to be contiguous in memory, and it can't use the mv vector directly since it can be clobbered, so the VM mv call allocates a variadic array on the native stack and copies the mv vector into it. This is no longer necessary with the new compilation - the arguments are instead on the VM stack already. Therefore this has the same number of copies in the one-argument-form case (the native stack copy is eliminated, but a new push-values copy takes place), and one fewer in the multiple-argument-form case.

This change also has the nice effect that regular and mv calls have identical stack effects - the arguments are always on the VM stack right after the callee. Possibly a debugger could leverage this to get at the original arguments. For something a little wackier, the call instructions could be eliminated - instead all calls would be "variadic" and you'd just push the argument count as a constant before doing an mv-call. (That would use an extra byte, though.)

The advantage for the compilation system is that now push-values and append-values map to values-save instructions pretty straightforwardly, and then the last one before the mv call is instead a values-collect.

There is a disadvantage in that this uses more VM stack space. I actually had to increase how much Clasp allocates for each thread in order to avoid an overflow during build. But I'm not really concerned about that.

Variable bindings

The main problem with the BIR produced by the system currently is that it can't distinguish between variables that share a register index. For example, for the dumb code (progn (let ((y x)) (print y)) (let ((y x)) (print y))) we get the bytecode

  ref 0 ; x
  set 1 ; y
  fdefinition 'PRINT
  ref 1 ; y
  call-receive-fixed 1 0
  ref 0 ; x
  set 1 ; y
  fdefinition 'PRINT
  ref 1 ; y
  call-received-fixed 1 0

There is no obvious way to tell that these are two different variables, so the produced BIR will use the same bir:variable for both of them. This could inhibit optimizations.

It might be good for BIR optimizations to be able to detect this situation and split variables, since this can arise with normal def-use chain stuff even without the obvious syntactic source, but it would also be good for the bytecode-to-BIR compiler to be able to generate better BIR right off the bat in cases like this.

I think this could be improved by simply mandating that the bytecode compiler output bind instructions for variable bindings and set for setqs, rather than currently where bind 1 is replaced by set. Then it's obvious from the instruction stream what is or isn't a binding, and bytecode-to-BIR can make a new BIR variable when it sees a bind regardless of the register index. If efficiency is an issue, we could introduce bind1 and setn instructions.

Ordering

The compiler just processes instructions in the order they appear in the bytecode rather than following jumps/branches. This isn't strictly required, but it makes things easy. But to be possible, the bytecode needs to be ordered in a certain way - functions precede any functions they close over, and each function's blocks precede all blocks they dominate. The bytecode compiler already produces code in this order, but it's a noteworthy precondition.

Lambda lists

The lambda list instructions are kind of strange - they can only appear at the beginning of the function in a very stereotyped way. The compiler system more or less ignores them, instead processing the saved lambda list. It might be possible/desirable to do this in the VM as well - i.e. instead of having the lambda list instructions, the function instead stores some conveniently abbreviated description of the lambda list (e.g. required argument count + optional argument count + restp + list of keys + aokp) which is then processed specially on function entry, and then the actual bytecode starts fafterwards. But then again the lambda list instructions might be useful for inlining and/or local calls.

Miscellaneous

  • Define that there can only be one make-closure, or one make-uninitialized-closure and one initialize-closure, for a given function. Again, this is what we do anyway, but requiring it is a change in meaning.
  • Now that savesp is working, it would be convenient if exit referred directly to a closure index. This would rule out not doing the optimization, but would simplify some things both for the compiler and for the VM and its definition.
  • It would be kind of nice if make-closure and initialize-closure referred directly to the indices rather than working with the stack. One way of doing this would be to eliminate closure and instead say that the first however many entries in the register file are for closed over variable. Then make-closure could just be followed by indices. This would be a radical change and difficult to reconcile with the one-pass approach, so I don't think it's a good idea, but I want to mention it if only to rule it out.
  • Make all code reachable. This is not currently the case, e.g. (progn (return) (foo)) will compile to bytecode that calls foo, even though that bytecode is never reachable. This is not necessary for compilation but would be kind of convenient. Unfortunately, making sure not to generate unreachable bytecode would complicate the compiler a bit.

Verification and annotation

When loading a bytecode FASL, a paranoid implementation should verify that the bytecode is correct, so that it doesn't execute something that crashes. These should basically just formalize what the compiler already does. Verification never needs to be done on the compiler output, unless you're debugging the compiler, in which case it will probably be useful, sorta like the BIR verifier. This is less of a priority for Clasp, but would be good, especially for something like SICL that wants to be really sure that you can't crash the system by loading a malformed FASL.

These verification conditions are also necessary for the bytecode-to-BIR compiler to work, hence why I'm thinking about them.

Annotations

Currently the bytecode-to-BIR compilation does some preliminary work figuring out where block boundaries are. This information could be computed in a separate phase and stored as metadata, along with other useful (but harder to compute) data like stackmaps and about phis. Doing it this way rather than integrating into the compiler would allow it to be generated, optionally, by the frontend compiler. If the frontend did not do this (for compilation speed) it could instead be generated after the fact. Either way it could be fed into further compilation.

Other annotations include debug information, and information about declarations ignored by the bytecode compiler but useful for further compilation (e.g. optimize). These are not recoverable after the fact from a bare bytecode stream.

Correctness conditions

  • Except for multiple value call related stuff, the number of items (and to some extent the nature of the items) on the stack is statically knowable at each IP. So for example you can't have loops that keep pushing more to the stack unless they also pop it off.
  • The number of things on the stack at each IP exceeds the number of things popped by the instruction at that IP (no stack underflow).
  • Similarly, the multiple value state is consistent. That is: call, mv-call, pop-values, and pop put something in the MV register, while push-values, append-values, mv-call, return, and push consume the MV register. For example push; push-values; is illegalg. call-receive-one, call-receive-fixed, mv-call-receive-one, and mv-call-receive-fixed also consume the MV register if only in that any previous MV is invalid after they are executed. NOTE: I had to make a minor change to the bytecode compiler in order to enforce this one - making sure to generate call-receieve-fixed nargs 0 instead of just call when no values are expected.
  • All ref and set instructions for a given index are dominated by a bind, bind-required-args, bind-optional-args, parse-key-args, or entry for that index.
  • This dominance is interrupted by a binding instruction including the same index. For example, bind 1 0; jump-if after; bind 1 0; after: ref 0 is invalid. This enforces proper nesting of lexical bindings.
  • No closure instruction uses an index greater than the number of things the function closes over.
  • No const instruction uses an index greater than the number of constants in the module.
  • cell-ref and cell-set always pop a statically determinable cell.
  • make-cell never pops a statically determinable cell.
  • exit always gets an entry as input, and nothing else does.
  • restoresp always gets a saved sp as input, and nothing else does.
  • Argument check instructions are at the beginning of the function. There is at most one bind-required-args etc.
  • mv-call*, append-values and pop-values are only used when the top of the stack is variable (from either push-values or append-values).
  • The constant for parse-key-args is a symbol, as are the subsequent constants for the instruction.
  • The constant for fdefinition is a function cell (whatever that means for the implementation).
  • Similarly for symbol-value and set-symbol-value.
  • The constants for make-closure and make-uninitialized-closure are template functions.
  • All entrys are terminated by an entry-close, all special-binds are terminated by an unbind, and they are not interleaved or otherwise out of order.
  • Instruction boundaries are correct, i.e. jumps never go inside an instruction.
  • All code in the function is (statically) reachable. For example if there is code after a return instruction, it must be jumped to. NOTE: As mentioned above, this condition is not currently met, and would take some nontrivial complication to produce in the compiler.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment