zhiics/dis.md

## dis.md

      
    Raw
  

              dis.md
            
          
    We think having a dicussion/RFC for registered based VM is necessary if it brings great value.
We first talked about how register allocation can be done in VM


we can collect the liveness first using ANF where scoping information could used as basic block so that traditional
kill-gen analysis could be used.


Linear scan RA should be helpful here. Graph coloring needs more consideration about spilling. Also an object may in
a register for a long while. Linear scan should satfisy our case as the liveness of each variable/object should be
genearlly short. Linear scan is much easier to implement as well


Stack-based VM:
Execute calls in async way. Using async future to execute data flow, i.e. primitive/function calls in a for loop could be chained up.
Considerations:

Simplicity (impl, calling convention, etc)

VM based is simpler


Parallelism (primitive calls)

May be easier for register-based? (Not actually sure, since many VMs like JVM, CPython VM, Rust VM, GraalVM, are actuall stack based.


Code size

Register-based usually has larger code size due to the requirement to load values


Optimizations

Some optimizations might be easier for register-based VM, e.g. CSE and CP?


IR:
Relay:
fn (%i: int32,
    %accum: int32) {
  %0 = equal(%i, 0)
  if (%0) {
    %accum
  }  else {
    %1 = subtract(%i, 1)
    %2 = add(%accum, %i)
    %3 = @sum_up(%1, %2)
    %3
  }
}
Stack-based VM:
sum_up: 
0: push 0;
1: alloc_tensor() uint1;
2: invoke_packed 0 2 1;
3: push 2;
4: if 1 5;
5: push 1;
6: move 3 3;
7: pop 0;
8: goto 14;
9: push 0;
10: alloc_tensor() int32;
11: invoke_packed 1 2 1;
12: push 1;
13: push 0;
14: alloc_tensor() int32;
15: invoke_packed 2 3 1;
16: push 3;
17: push 4;
18: invoke 1;
19: push 5;
20: move 6 3;
21: pop 3;
22: push 3;
23: ret;
Register-based (we probably need more consideration about calling convention, i.e. caller and callee saved registers, return registers, etc.):
# Assume 1) we have labled tensor objects as virtual registers, such as TR1, TR2, etc. 2) TR1 is used to return values.
# foo(int i, int accum)
# push bp
# move bp, sp
# Note 8 or 12 could be different depending on the calling convention. Here, we assume X86 calling convention, i.e. EBP and # RA are saved on the stack.
# move $TR2, [bp+8];
# move $TR3, [bp+12]
test $TR2
JZ FALLTHROUGH

add $TR3, $RT2
lea $TR5, XXXX; get 1
sub $TR2, %TR5; i - 1
# push to parameters to stack?
# push $TR2
# push $TR3
call   foo(int, int)
# recover satck? add sp, 8
FallTHROUGH:
move $TR1, $TR3