We think having a dicussion/RFC for registered based VM is necessary if it brings great value.
We first talked about how register allocation can be done in VM
-
we can collect the liveness first using ANF where scoping information could used as basic block so that traditional kill-gen analysis could be used.
-
Linear scan RA should be helpful here. Graph coloring needs more consideration about spilling. Also an object may in a register for a long while. Linear scan should satfisy our case as the liveness of each variable/object should be genearlly short. Linear scan is much easier to implement as well
Stack-based VM: Execute calls in async way. Using async future to execute data flow, i.e. primitive/function calls in a for loop could be chained up.
Considerations:
- Simplicity (impl, calling convention, etc)
- VM based is simpler
- Parallelism (primitive calls)
- May be easier for register-based? (Not actually sure, since many VMs like JVM, CPython VM, Rust VM, GraalVM, are actuall stack based.
- Code size
- Register-based usually has larger code size due to the requirement to load values
- Optimizations
- Some optimizations might be easier for register-based VM, e.g. CSE and CP?
IR: Relay:
fn (%i: int32,
%accum: int32) {
%0 = equal(%i, 0)
if (%0) {
%accum
} else {
%1 = subtract(%i, 1)
%2 = add(%accum, %i)
%3 = @sum_up(%1, %2)
%3
}
}
Stack-based VM:
sum_up:
0: push 0;
1: alloc_tensor() uint1;
2: invoke_packed 0 2 1;
3: push 2;
4: if 1 5;
5: push 1;
6: move 3 3;
7: pop 0;
8: goto 14;
9: push 0;
10: alloc_tensor() int32;
11: invoke_packed 1 2 1;
12: push 1;
13: push 0;
14: alloc_tensor() int32;
15: invoke_packed 2 3 1;
16: push 3;
17: push 4;
18: invoke 1;
19: push 5;
20: move 6 3;
21: pop 3;
22: push 3;
23: ret;
Register-based (we probably need more consideration about calling convention, i.e. caller and callee saved registers, return registers, etc.):
# Assume 1) we have labled tensor objects as virtual registers, such as TR1, TR2, etc. 2) TR1 is used to return values.
# foo(int i, int accum)
# push bp
# move bp, sp
# Note 8 or 12 could be different depending on the calling convention. Here, we assume X86 calling convention, i.e. EBP and # RA are saved on the stack.
# move $TR2, [bp+8];
# move $TR3, [bp+12]
test $TR2
JZ FALLTHROUGH
add $TR3, $RT2
lea $TR5, XXXX; get 1
sub $TR2, %TR5; i - 1
# push to parameters to stack?
# push $TR2
# push $TR3
call foo(int, int)
# recover satck? add sp, 8
FallTHROUGH:
move $TR1, $TR3
I think we don't need to save register. We can keep the register file in each call frame. When we call a function, we just need to create a
VMFrame
and initialize the args in the beginning of its register file.