C2 is a speculative compiler. When it encounters a conditional branch and profiling data suggest that the control is highly likely to take one path, parser intends to prune the other path and leave a runtime stub uncommon_trap. We refer to it as “unstable_if trap” because the reason of this stub is unstable_if. If the rare case does happen, the trap triggers deoptimization. HotSpot realizes that current profiling data do not hold up and destroys this compilation. The execution returns to interpreter. Presumably, the upcoming compilation of this method will correct itself with updated profiling data.
Bci(byte code index) in JVM is analogous to the PC(program counter) of a physical processor. It points to the current bytecode in interpreter. Because an uncommon_trap stub punts to interpreter, it must restore the interpreter state in JVM. Restoring the interpreter state means that all variables live at the current bci must have correct values. Besides live variables of primitive types (int, char etc), deoptimization has to rematerialize all reference type variables which can refer to non-escaping objects if C2 has done scalar replacement for them. Each uncommon_trap is a SafePointNode in C2 IR. All live variables are the inputs of it including those scalarized objects. C2 codegen synthesizes them and write to debuginfo. DebugInfo are stored in a separate section along with text section in codecache.
Currently, unstable_if trap is conservative about bci. It points to the conditional branch. When interpreter resumes execution, it will reevaluate the conditional branch and take a path accordingly. Here is an example. driver has invoked testMethod(true)
for thousands of times, so the profiling data suggest cond
(highlighted) at line-5 is “always-true”.
1 public int testMethod(boolean cond) {
2 int sum = 0;
3 for (int i = 0; i < data.length; i++) {
4 Integer ii = Integer.valueOf(data[i]);
5 if (cond) {
6 sum += ii.intValue();
7 }
8 }
9 return sum;
10 }
List-1: Source code of testMethod
Readers can use javap to inspect the method in bytecode.
public int testMethod(boolean);
Code:
0: iconst_0
1: istore_2
2: iconst_0
3: istore_3
4: iload_3
5: getstatic #7 // Field data:[I
8: arraylength
9: if_icmpge 40
12: getstatic #7 // Field data:[I
15: iload_3
16: iaload
17: invokestatic #13 // Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
20: astore 4
22: iload_1
23: ifeq 34
26: iload_2
27: aload 4
29: invokevirtual #19 // Method java/lang/Integer.intValue:()I
32: iadd
33: istore_2
34: iinc 3, 1
37: goto 4
40: iload_2
41: ireturn
List-2: bytecodes of testMethod
In List-1, “if (cond)” at line 5 is a conditional branch, which corresponds to bytecode: 23: ifeq 34
in List-2. If profiling data suggest that the branch 23: ifeq is always non-taken, the other path is pruned and a unstable_if trap is placed instead. This code snippet is like line 5-9 in List-3. I.e C2 parser assumes next bci of 23: ifeq is 26.
Current C2 associates the unstable_if trap at line 8 with bci: 23. We think it is too conservative. It is safe to use bci: 34 instead because the conditional branch at bci: 23 is supposed to take after deoptimization, or the deoptimization should not happen in the first place.
Variable ii
is local#4 defined at bci: 20 in List-2. It’s noteworthy it is live at bci: 23 (highlighted) but dead at bci: 34.
1 public int testMethod() {
2 int sum = 0;
3 for (int i = 0; i < data.length; i++) {
4 Integer ii = Integer.valueOf(data[i]);
5 if (cond) {
6 sum += data[i]; // from sum += ii.intValue();
7 } else {
8 uncommon_trap(Reason_unstable_if, Action_reinterpreter, ...);
9 }
10 }
11 return sum;
12 }
List-3: psuedeo code of testMethod in optimizer
The call of Integer.valueOf() at line 4 should get removed because no one uses local ii
anymore after AggressiveUnboxing (Line 6). Unfortunately, it is not the case in current C2. Because parser uses bci: 23 to collect liveness for the uncommon_trap node. It references ii
in its debuginfo section. It is the very reference prevents late-inliner from eliminating the function call of Integer.valueOf().
The key observation is that JVM will take the other path when deoptimization does happen. We could use the liveness of destination instead of the conditional branch. If the liveness set of new location is smaller than before, we allow more optimization in C2 and save space in debuginfo.
Parser records the speculative bci for all unstable_if traps. The speculative bci points the path which is unlikely to be taken. We use the liveness of the speculative bci to kill dead locals for an unstable_if trap. Eg. the original bci is 23 in List-2. The speculative bci is 34, where ii
has been dead. C2 removes local ii
from uncommon_trap call’s inputs. This unlocks late-inliner to delete the function call of Integer.valueOf().
Because we keep the original bci for deoptimization, interpreter needs to evaluate the conditional branch again, eg. 23: ifeq 34
in List-2. It is possible that the orignal bytecode (ifnull/ifnonnull, if_acmpne/eq) reference objects on operand stack, thus we conservatively prevent operands of the bytecode from being killed.
We ran the microbenchmark of JDK-8261137 and got 9x speedup on x86_64.
$java -jar ./StringFunc-jmh.jar -wi 5 -f 1 -i 10 com.amazon.jdkteam.brownbag.MyBenchmark.testMethod
Benchmark Mode Cnt Score Error Units
MyBenchmark.testMethod avgt 10 3.679 ± 0.014 us/op
$java -jar ./StringFunc-jmh.jar -wi 5 -f 1 -i 10 -jvmArgs " -XX:+UnlockDiagnosticVMOptions -XX:-AggressiveLivenessForUnstableIf" com.amazon.jdkteam.brownbag.MyBenchmark.testMethod
Benchmark Mode Cnt Score Error Units
MyBenchmark.testMethod avgt 10 34.464 ± 0.855 us/op
An unstable if trap is trivial if it can not prune any code, or the pruned basic block is small. eg. the unstable if trap generated at line 8 of List-3 is trivial because it masks out nothing. The else-section of original program is empty. The trivial unstable_if traps not only complicate codeshape but also consume codecache.
I would like to record the speculative basic block as well. C2 can determine whether an unstable if is trivial by its definition after parsing. The challenging part is how to merge the block of conditional branch to the speculative block. It is an open question.
- JBS issue of this work: https://bugs.openjdk.java.net/browse/JDK-8286104
- JBS issue of the follow-up: https://bugs.openjdk.java.net/browse/JDK-8287385
We can’t update MDO correctly if we do so. Secondly, unstable_if trap may be shared by other code. see the comment from Vladimir Kozlov. openjdk/jdk#8545 (comment)
Breakpoints can only be placed via JVMTI. When it is placed, all dependent methods are supposed to be destroyed.
void BreakpointInfo::set(Method* method) {
#ifdef ASSERT
{
Bytecodes::Code code = (Bytecodes::Code) *method->bcp_from(_bci);
if (code == Bytecodes::_breakpoint)
code = method->orig_bytecode_at(_bci);
assert(orig_bytecode() == code, "original bytecode must be the same");
}
#endif
Thread *thread = Thread::current();
*method->bcp_from(_bci) = Bytecodes::_breakpoint;
method->incr_number_of_breakpoints(thread);
{
// Deoptimize all dependents on this method
HandleMark hm(thread);
methodHandle mh(thread, method);
CodeCache::flush_dependents_on_method(mh);
}
}
When JDWP is on, JVMTI feature should_retain_local_variables is on. This refrain from liveness compuating from CFG. It simply marks all local variables live.
MethodLivenessResult ciMethod::liveness_at_bci(int bci) {
if (CURRENT_ENV->should_retain_local_variables() || DeoptimizeALot) {
// Keep all locals live for the user's edification and amusement.
MethodLivenessResult result(_max_locals);
result.set_range(0, _max_locals);
result.set_is_valid();
return result;
}
return raw_liveness_at_bci(bci);
}
This transformation fuses 2 signed integer comparisons into one unsigned one. If both 2 IfNode has uncommon_traps, it will use the dominating one to cover 2 IfNodes. It conflicts with my assumption presented in this article, here is why.
eg. see the 2 comparisons at line 4.
1 public static Short valueOf(short s) {
2 final int offset = 128;
3 int sAsInt = s;
4 if (sAsInt >= -128 && sAsInt <= 127) { // must cache
5 return ShortCache.cache[sAsInt + offset];
6 }
7 return new Short(s);
8 }
CmpI(le, sAsInt, 127) && !CmpI(lt, sAsInt, -128) => CmpI(le, sAsInt, 127) => node 39 CmpI(lt, sAsInt, -128) => node 25
fold-compares transforms 2 integer comparisons into one unsigned comparison.
sAsInt >= -128 && sAsInt <= 127
=>
0 <= sAsInt + 128 < 256
=>
CmpU(lt, sAsInt + 128, 257)
That’s node 59 is the new unsigned comparison.
------------------------ OptoAssembly for Compile_id = 26 -----------------------
#
# java/lang/Short:exact * ( int )
#
000 N1: # out( B1 ) <- in( B3 B2 ) Freq: 1
000 B1: # out( B3 B2 ) <- BLOCK HEAD IS JUNK Freq: 1
000 # stack bang (144 bytes)
pushq rbp # Save rbp
subq rsp, #32 # Create frame
00c movl R11, RSI # spill
00f addl R11, #128 # int
016 cmpl R11, #256 # unsigned
01d jnb,us B3 P=0.000000 C=5375.000000
01f B2: # out( N1 ) <- in( B1 ) Freq: 1
01f movslq R10, RSI # i2l
022 movq R11, narrowoop: java/lang/Short:NotNull:exact *[int:256]<ciObjArray length=256 type=<ciObjArrayKlass name=[Ljava/lang/Short; ident=1271 address=0x00007fc0f83072c0> ident=1274 address=0x00007fc0f8307d70> * # ptr
02c movl R10, [R11 + #528 + R10 << #2] # compressed ptr
034 decode_heap_oop_not_null RAX,R10
038 addq rsp, 32 # Destroy frame
popq rbp
cmpq rsp, poll_offset[r15_thread]
ja #safepoint_stub # Safepoint: poll for GC
04a ret
04b B3: # out( N1 ) <- in( B1 ) Freq: 4.76837e-07
04b movl [rsp + #0], RSI # spill
04e movl [rsp + #4], RSI # spill
052 movl RSI, #-195 # int
057 call,static wrapper for: uncommon_trap(reason='unstable_fused_if' action='reinterpret' debug_id='0')
# java.lang.Short::valueOf @ bci:9 (line 277) L[0]=rsp + #0 L[1]=_ L[2]=rsp + #4 STK[0]=rsp + #0 STK[1]=#-128
# OopMap {off=92/0x5c}
05c stop # ShouldNotReachHere
We have to give up our optimization when we detect fold-compares. It could happens on Short.valueOf and Enums. Because Integer.valueOf() doesn't use literals for the range check, it is not applicable for fold-compares.