This was an attempt to see what Visual C++ does to a simple function call, and a simple goto spam
type statement.
The results are unsurprising (function calls use the call
instruction, goto
uses the jmp
instruction).
I produced the assembly by inserting a breakpoint, running the "Debug -> Start Debugging" (F5), then hitting CTRL+ALT+D
to view the assembly.
It works good enough.
If you want to implement a virtual machine, for example, which reads in bytecode, your repl function (in, say, C) will look something like:
enum CommandType {
ADD = 0,
/* ... */
};
void dispatch_loop(VirtualMachine *vm) {
while (!ShutdownVM(vm)) {
switch(vm->ip) {
case ADD:
do_add(vm);
break;
/* case ...: ... */
};
}
}
Some people call this switch threading.
This sometimes turns out to be inefficient, because of all the function calls to the various do_command(vm)
.
We can speed things up by using a goto command
, which is not easily done.
One low hanging fruit is to replace the do_add(vm)
function call with the body of the do_add()
function.
This helps, but the branch prediction CPUs do on switch statements is responsible for the slowness.
In GCC and Clang, you can have computed goto statements:
typedef struct {
register int ip;
unsigned char *code_area;
} VirtualMachine;
void dispatch_loop(register VirtualMachine *vm) {
// PUN: the indices in the dispatch table ARE the bytecode values for the corresponding instructions
static void *table[] = {
&&do_add, /* ... */
};
// apparently standard conventional name for this is NEXT(), following
// Forth's lead in direct threaded code
#define NEXT() goto *table[vm->code_area[vm->ip]]
NEXT(); // get the ball rolling
do_add:
/* body of the method */
/* update vm->ip */
NEXT();
/* remaining cases */
do_shutdown:
/* cleanup the vm */
/* DO NOT CALL NEXT() as it is THE END! */
}
This is compiler dependent, obviously, and the only alternative (while capturing the goto
speed) is to use assembly to get the same results. The performance results are mixed, as one observer puts it:
Direct threading has mixed performance. On some machines, this is very fast, but other machines may make use of the fact that code and data are not often directly intermixed. Modern hardware even raises an error condition if it detects this because it is symptomatic of the kind of self-modifying code that malicious software uses. Moving the data structures (for garbage collection) is also problematic because you have to re-link the code blocks that are embedded within them.
- Threaded Code
- Threaded Interpreter
- How to Create Jump Tables via Function Pointer Arrays in C and C++
- Computed goto for efficient dispatch tables
- Mathew Zaleski's Dispatch Techniques (ch.3 of his 2008 thesis)