Skip to content

Instantly share code, notes, and snippets.

@lpereira
Created May 14, 2015 10:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save lpereira/ccbb99a0ed288a2e9487 to your computer and use it in GitHub Desktop.
Save lpereira/ccbb99a0ed288a2e9487 to your computer and use it in GitHub Desktop.
Lwan template interpreter
New Old
movsx esi,BYTE PTR [r14+0x8] movsx esi,BYTE PTR [r13+0x8]
mov rdi,r12 mov rdi,rbp
add rbx,0x1 add r13,0x18
call 0x412fd0 <strbuf_append_char> call 0x413ed0 <strbuf_append_char>
mov rax,QWORD PTR [r15+rbx*8] mov eax,DWORD PTR [r13+0x0]
jmp QWORD PTR [rax*8+0x416440] mov rax,QWORD PTR [rax*8+0x4166a0]
jmp rax
C: C:
pc++; chunk++;
goto *dispatch_table[ops[pc]]; goto *dispatch_table[chunk->action];
`struct chunk` is 0x18 bytes. `ops` is an array of all the `action` fields from a `struct chunk` array,
tightly packed together. Code is pretty much equivalent, except that GCC had to load the next address
to `rax` and then jump there; it couldn't fuse the read with the indirect jump. Why?
@lpereira
Copy link
Author

@ricbit: I didn't measure; I've no idea if there's any difference, but I was puzzled anyway. The version on the left is 8 bytes shorter than the version on the right. Architecture is x86-64; my machine is a Sandy Bridge Core i7.

The C code for this snippet is here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment