Skip to content

Instantly share code, notes, and snippets.

@rednaxelafx
Created December 30, 2010 05:21
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save rednaxelafx/759495 to your computer and use it in GitHub Desktop.
Save rednaxelafx/759495 to your computer and use it in GitHub Desktop.
A code snippet to show some relationship between JVM/HotSpot's and Dalvik's interpreter.
Java source code:
k = i + j;
May compile to Java bytecode:
iload_0
iload_1
iadd
istore_2
And may turn into Dalvik VM code:
add-int v2, v1, v0
Compare HotSpot Client VM's interpreter in JDK6u18 with Dalvik's interpreter in Android 2.0, on x86.
To execute the program above, the code traces from unrolling the intepreters' fetch-dispatch-execute loop,
are:
HotSpot's interpreter (client mode default config):
;;-------------iload_0-------------
mov eax, dword ptr [edi]
movzx ebx, byte ptr [esi + 1]
inc esi
jmp dword ptr [ebx*4 + 6DB188C8]
;;-------------iload_1-------------
push eax
mov eax, dword ptr [edi-4]
movzx ebx, byte ptr [esi+1]
inc esi
jmp dword ptr [ebx*4 + 6DB188C8]
;;--------------iadd---------------
pop edx
add eax, edx
movzx ebx, byte ptr [esi + 1]
inc esi
jmp dword ptr [ebx*4 + 6DB188C8]
;;------------istore_2-------------
mov dword ptr [edi-8],eax
movzx ebx,byte ptr [esi+1]
inc esi
jmp dword ptr [ebx*4 + 6DB19CC8]
Dalvik's interpreter:
;;------------add-int--------------
movzx eax, byte ptr [edx + 2]
movzx ecx, byte ptr [edx + 3]
mov eax, dword ptr [esi + eax*4]
add eax, dword ptr [esi + ecx*4]
movzx ecx, bh
movzx ebx, word ptr [edx + 4]
lea edx, dword ptr [edx + 4]
mov dword ptr [esi + ecx*4], eax
movzx eax, bl ; GOTO_NEXT "computed next" version
sal eax, $$$handler_size_bits
add eax, edi
jmp eax
If we strip off the fetch/dispatch part from the two code traces above, we'll get:
HotSpot:
;;-------------iload_0-------------
mov eax, dword ptr [edi]
;;-------------iload_1-------------
push eax
mov eax, dword ptr [edi - 4]
;;--------------iadd---------------
pop edx
add eax, edx
;;------------istore_2-------------
mov dword ptr [edi - 8], eax
Dalvik:
;;------------add-int--------------
movzx eax, byte ptr [edx + 2]
movzx ecx, byte ptr [edx + 3]
mov eax, dword ptr [esi + 4*eax]
add eax, dword ptr [esi + 4*ecx]
movzx ecx, bh
mov dword ptr [esi + 4*ecx], eax
Now we can see that in this example, counting the number of instruction that actually executes user code's
original semantics, both HotSpot's and Dalvik's interpreter uses 6 x86 instructions.
Which means, HotSpot doesn't lose performance in the "execution" part just because the JVM spec defined a
stack-based instruction set. By using 1-top-of-stack caching, HotSpot can still make efficient use of machine
registers during interpretation, in spite of the fact it's emulating a stack-based abstract machine.
On the other hand, Dalvik's interpreter (on x86) stores all of its "virtual registers" on the stack frames,
which is in memory, which is in turn slower to access than HotSpot's TOS (top-of-stack) value. Of course,
Dalvik can further tune the interpreter to try and squeeze even more performance out, but due to the scarce
number of registers available on x86, it's going to be pretty hard. It'll be easier if there are more free
registers, like x86-64 or some RISC processor.
But because JVM has to use more number of bytecode instructions than Dalvik to do the same work, the "fetch-
dispatch" part makes HotSpot's interpreter have to pay more interpretation overhead than Dalvik's.
------------------------------------------------------------------------------------------------
It's interesting if we look at Sun JDK 1.1.8's interpreter. To run the example shown above, and again count-
ing just the "execution" part, we'd get:
;;-------------iload_0-------------
mov ebx, dword ptr [ebp]
;;-------------iload_1-------------
mov ecx, dword ptr [ebp + 4]
;;--------------iadd---------------
add ebx, ecx
;;------------istore_2-------------
mov dword ptr [ebp + 8], ebx
That's 2 memory reads and 1 memory write, exactly what you'd get were the example written in C and compiled
without optimization, which is not bad for an interpreter. This is also the effect of multi-state top-of-
stack caching.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment