Skip to content

Instantly share code, notes, and snippets.

@zhiics
Last active March 25, 2019 05:48
Show Gist options
  • Save zhiics/f2da079262075100a01c38a4c9d991b2 to your computer and use it in GitHub Desktop.
Save zhiics/f2da079262075100a01c38a4c9d991b2 to your computer and use it in GitHub Desktop.
Jared and Zhi discussion

We think having a dicussion/RFC for registered based VM is necessary if it brings great value.

We first talked about how register allocation can be done in VM

  1. we can collect the liveness first using ANF where scoping information could used as basic block so that traditional kill-gen analysis could be used.

  2. Linear scan RA should be helpful here. Graph coloring needs more consideration about spilling. Also an object may in a register for a long while. Linear scan should satfisy our case as the liveness of each variable/object should be genearlly short. Linear scan is much easier to implement as well

Stack-based VM: Execute calls in async way. Using async future to execute data flow, i.e. primitive/function calls in a for loop could be chained up.

Considerations:

  1. Simplicity (impl, calling convention, etc)
    • VM based is simpler
  2. Parallelism (primitive calls)
    • May be easier for register-based? (Not actually sure, since many VMs like JVM, CPython VM, Rust VM, GraalVM, are actuall stack based.
  3. Code size
    • Register-based usually has larger code size due to the requirement to load values
  4. Optimizations
    • Some optimizations might be easier for register-based VM, e.g. CSE and CP?

IR: Relay:

fn (%i: int32,
    %accum: int32) {
  %0 = equal(%i, 0)
  if (%0) {
    %accum
  }  else {
    %1 = subtract(%i, 1)
    %2 = add(%accum, %i)
    %3 = @sum_up(%1, %2)
    %3
  }
}

Stack-based VM:

sum_up: 
0: push 0;
1: alloc_tensor() uint1;
2: invoke_packed 0 2 1;
3: push 2;
4: if 1 5;
5: push 1;
6: move 3 3;
7: pop 0;
8: goto 14;
9: push 0;
10: alloc_tensor() int32;
11: invoke_packed 1 2 1;
12: push 1;
13: push 0;
14: alloc_tensor() int32;
15: invoke_packed 2 3 1;
16: push 3;
17: push 4;
18: invoke 1;
19: push 5;
20: move 6 3;
21: pop 3;
22: push 3;
23: ret;

Register-based (we probably need more consideration about calling convention, i.e. caller and callee saved registers, return registers, etc.):

# Assume 1) we have labled tensor objects as virtual registers, such as TR1, TR2, etc. 2) TR1 is used to return values.
# foo(int i, int accum)
# push bp
# move bp, sp
# Note 8 or 12 could be different depending on the calling convention. Here, we assume X86 calling convention, i.e. EBP and # RA are saved on the stack.
# move $TR2, [bp+8];
# move $TR3, [bp+12]
test $TR2
JZ FALLTHROUGH

add $TR3, $RT2
lea $TR5, XXXX; get 1
sub $TR2, %TR5; i - 1
# push to parameters to stack?
# push $TR2
# push $TR3
call   foo(int, int)
# recover satck? add sp, 8
FallTHROUGH:
move $TR1, $TR3
@icemelon
Copy link

I think we don't need to save register. We can keep the register file in each call frame. When we call a function, we just need to create a VMFrame and initialize the args in the beginning of its register file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment