Skip to content

Instantly share code, notes, and snippets.

@pqnelson
Last active August 25, 2023 12:37
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pqnelson/01d60d8312c6194ce5effd77228a5557 to your computer and use it in GitHub Desktop.
Save pqnelson/01d60d8312c6194ce5effd77228a5557 to your computer and use it in GitHub Desktop.
Dispatch Techniques for Interpreters

This was an attempt to see what Visual C++ does to a simple function call, and a simple goto spam type statement. The results are unsurprising (function calls use the call instruction, goto uses the jmp instruction). I produced the assembly by inserting a breakpoint, running the "Debug -> Start Debugging" (F5), then hitting CTRL+ALT+D to view the assembly. It works good enough.

Why?

If you want to implement a virtual machine, for example, which reads in bytecode, your repl function (in, say, C) will look something like:

enum CommandType {
    ADD = 0,
    /* ... */
};

void dispatch_loop(VirtualMachine *vm) {
    while (!ShutdownVM(vm)) {
        switch(vm->ip) {
        case ADD:
            do_add(vm);
            break;
        /* case ...: ... */
        };
    }
}

Some people call this switch threading. This sometimes turns out to be inefficient, because of all the function calls to the various do_command(vm). We can speed things up by using a goto command, which is not easily done.

One low hanging fruit is to replace the do_add(vm) function call with the body of the do_add() function. This helps, but the branch prediction CPUs do on switch statements is responsible for the slowness.

Computed GOTOs for Direct Threading

In GCC and Clang, you can have computed goto statements:

typedef struct {
    register int ip;
    unsigned char *code_area;
} VirtualMachine;
    
void dispatch_loop(register VirtualMachine *vm) {
    // PUN: the indices in the dispatch table ARE the bytecode values for the corresponding instructions
    static void *table[] = {
        &&do_add, /* ... */
    };
// apparently standard conventional name for this is NEXT(), following
// Forth's lead in direct threaded code
#define NEXT() goto *table[vm->code_area[vm->ip]]
    
    NEXT(); // get the ball rolling
    do_add:
        /* body of the method */
        /* update vm->ip */
        NEXT();
    /* remaining cases */
    do_shutdown:
        /* cleanup the vm */
        /* DO NOT CALL NEXT() as it is THE END! */
}

This is compiler dependent, obviously, and the only alternative (while capturing the goto speed) is to use assembly to get the same results. The performance results are mixed, as one observer puts it:

Direct threading has mixed performance. On some machines, this is very fast, but other machines may make use of the fact that code and data are not often directly intermixed. Modern hardware even raises an error condition if it detects this because it is symptomatic of the kind of self-modifying code that malicious software uses. Moving the data structures (for garbage collection) is also problematic because you have to re-link the code blocks that are embedded within them.

References

; int foo(int a, int b) {
00275640 push ebp
00275641 mov ebp,esp
00275643 sub esp,0C0h
00275649 push ebx
0027564A push esi
0027564B push edi
0027564C lea edi,[ebp-0C0h]
00275652 mov ecx,30h
00275657 mov eax,0CCCCCCCCh
0027565C rep stos dword ptr es:[edi]
; return a + b;
0027565E mov eax,dword ptr [a]
00275661 add eax,dword ptr [b]
; }
00275664 pop edi
00275665 pop esi
00275666 pop ebx
00275667 mov esp,ebp
00275669 pop ebp
0027566A ret
; ...
; int main() {
00275680 push ebp
00275681 mov ebp,esp
00275683 sub esp,0C0h
00275689 push ebx
0027568A push esi
0027568B push edi
0027568C lea edi,[ebp-0C0h]
00275692 mov ecx,30h
00275697 mov eax,0CCCCCCCCh
0027569C rep stos dword ptr es:[edi]
; foo(31, 177);
0027569E push 0B1h
002756A3 push 1Fh
002756A5 call foo (02709BAh)
002756AA add esp,8
; return 0;
002756AD xor eax,eax
; }
002756AF pop edi
002756B0 pop esi
002756B1 pop ebx
002756B2 add esp,0C0h
002756B8 cmp ebp,esp
002756BA call __RTC_CheckEsp (0270AF5h)
002756BF mov esp,ebp
002756C1 pop ebp
002756C2 ret
int foo(int a, int b) {
return a + b;
}
int main() {
foo(31, 177);
return 0;
}
; int main() {
00A75640 push ebp
00A75641 mov ebp,esp
00A75643 sub esp,0E4h
00A75649 push ebx
00A7564A push esi
00A7564B push edi
00A7564C lea edi,[ebp-0E4h]
00A75652 mov ecx,39h
00A75657 mov eax,0CCCCCCCCh
00A7565C rep stos dword ptr es:[edi]
; int a, b, c;
; a = 31;
00A7565E mov dword ptr [a],1Fh
; b = 177;
00A75665 mov dword ptr [b],0B1h
; goto spam;
00A7566C jmp main+3Eh (0A7567Eh)
00A7566E jmp main+3Eh (0A7567Eh)
; bar:
; c = a * b;
00A75670 mov eax,dword ptr [a]
00A75673 imul eax,dword ptr [b]
00A75677 mov dword ptr [c],eax
; goto exit;
00A7567A jmp exit (0A75687h)
00A7567C jmp exit (0A75687h)
; spam:
; c = a + b;
00A7567E mov eax,dword ptr [a]
00A75681 add eax,dword ptr [b]
00A75684 mov dword ptr [c],eax
; exit:
; return 0;
00A75687 xor eax,eax
; }
00A75689 pop edi
00A7568A pop esi
00A7568B pop ebx
00A7568C mov esp,ebp
00A7568E pop ebp
int main() {
int a, b, c;
a = 31;
b = 177;
goto spam;
bar:
c = a * b;
goto exit;
spam:
c = a + b;
exit:
return 0;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment