Skip to content

Instantly share code, notes, and snippets.

@copy
Created December 17, 2022 18:25
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save copy/ecc99bac5ca0101e024525ddaf620731 to your computer and use it in GitHub Desktop.
Save copy/ecc99bac5ca0101e024525ddaf620731 to your computer and use it in GitHub Desktop.

Here's an overview of v86 workings. For details, check the source.

The major limitations of WebAssembly are:

  • structured control flow (you can't generate arbitrary jumps)
  • no control over registers (you can't keep x86 registers in wasm locals across functions)
  • no mmap (paging needs to be fully emulated)
  • no patching
  • module generation is fairly slow, but at least it's asynchronous, so other things can keep running
  • there is some memory overhead per module, so you can't generate more than a few thousand

v86 has an interpreted mode, which collects entry points (targets of function calls and indirect jumps). It also measures the hotness per page, so that compilation is focused on code that is often executed. Once a page is considered hot, code is generated for the entire page and up to MAX_PAGES that are directly reachable from it.

v86 generates a single function with a big switch statement (brtable), to ensure that all functions and targets of indirect jumps are reachable from other modules. The remaining control flow is handled using the "stackifier" algorithm (well-explained in this blog post).

At the moment, there is no linking of modules. The current module is exited, and the main loop detects if a new module can be entered.

In practice, I found that browsers don't handle this structure (deep brtables, with locals being used across the entire function) very well, and MAX_PAGES has to be set to fairly low, otherwise memory usage blows up. It's likely that improvements are possible (generating fewer entry points, splitting code across multiple functions).

To handle paging, v86 generates code similar to this (see gen_safe_read):

entry <- tlb[addr >> 12 << 2]
if entry & MASK == TLB_VALID && (addr & 0xFFF) <= 0xFFC - bytes: goto fast
entry <- safe_read_jit_slow(addr, instruction_pointer)
if page_fault: goto exit-with-pagefault
fast: mem[(entry & ~0xFFF) ^ addr]

There is a 4 MB cache that acts like a tlb. It contains the physical address, read-only bit, whether the page contains code (in order to invalidate it on write) and whether the page points to mmio. Any of those cases are handled in the slow path (safe_read_jit_slow), as well as walking the page tables and triggering page faults. The fast path is taken in the vast majority of times.

The remaining code generation is mostly a straight-forward, 1-to-1 translation of x86 to wasm. The only analysis done is to optimise generation of condional jumps immediately after arithmetic instructions, e.g.:

cmp eax, 52
setb eax

becomes:

// code for cmp
eax <- eax < 52

A lazy flag mechanism is used to speed arithmetic (applies to both jit and interpreted mode, see arith.rs and misc_instr.rs). There's a wip that tries to elide all flags updates: copy/v86#466

FPU instructions are emulated using softfloat (very slow, but unfortunately some code relies on 80 bit floats).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment