Skip to content

Instantly share code, notes, and snippets.

@navyxliu
Last active October 19, 2023 19:32
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save navyxliu/8362c9ae25ac02bd9e58fbf1dc05fc05 to your computer and use it in GitHub Desktop.
Save navyxliu/8362c9ae25ac02bd9e58fbf1dc05fc05 to your computer and use it in GitHub Desktop.
Aggressive liveness for unstable_if traps

Aggressive liveness for unstable_if traps

Background:

C2 is a speculative compiler. When it encounters a conditional branch and profiling data suggest that the control is highly likely to take one path, parser intends to prune the other path and leave a runtime stub uncommon_trap. We refer to it as “unstable_if trap” because the reason of this stub is unstable_if. If the rare case does happen, the trap triggers deoptimization. HotSpot realizes that current profiling data do not hold up and destroys this compilation. The execution returns to interpreter. Presumably, the upcoming compilation of this method will correct itself with updated profiling data.

Bci(byte code index) in JVM is analogous to the PC(program counter) of a physical processor. It points to the current bytecode in interpreter. Because an uncommon_trap stub punts to interpreter, it must restore the interpreter state in JVM. Restoring the interpreter state means that all variables live at the current bci must have correct values. Besides live variables of primitive types (int, char etc), deoptimization has to rematerialize all reference type variables which can refer to non-escaping objects if C2 has done scalar replacement for them. Each uncommon_trap is a SafePointNode in C2 IR. All live variables are the inputs of it including those scalarized objects. C2 codegen synthesizes them and write to debuginfo. DebugInfo are stored in a separate section along with text section in codecache.

Problem

Currently, unstable_if trap is conservative about bci. It points to the conditional branch. When interpreter resumes execution, it will reevaluate the conditional branch and take a path accordingly. Here is an example. driver has invoked testMethod(true) for thousands of times, so the profiling data suggest cond (highlighted) at line-5 is “always-true”.

  1 public int testMethod(boolean cond) {
  2   int sum = 0;
  3   for (int i = 0; i < data.length; i++) {
  4       Integer ii = Integer.valueOf(data[i]);
  5       if (cond) {
  6           sum += ii.intValue();
  7       }
  8   }
  9   return sum;
 10 }

 List-1: Source code of testMethod

Readers can use javap to inspect the method in bytecode.

public int testMethod(boolean);
Code:
0: iconst_0
1: istore_2
2: iconst_0
3: istore_3
4: iload_3
5: getstatic #7 // Field data:[I
8: arraylength
9: if_icmpge 40
12: getstatic #7 // Field data:[I
15: iload_3
16: iaload
17: invokestatic #13 // Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
20: astore 4
22: iload_1
23: ifeq 34
26: iload_2
27: aload 4
29: invokevirtual #19 // Method java/lang/Integer.intValue:()I
32: iadd
33: istore_2
34: iinc 3, 1
37: goto 4
40: iload_2
41: ireturn

List-2: bytecodes of testMethod

In List-1, “if (cond)” at line 5 is a conditional branch, which corresponds to bytecode: 23: ifeq 34 in List-2. If profiling data suggest that the branch 23: ifeq is always non-taken, the other path is pruned and a unstable_if trap is placed instead. This code snippet is like line 5-9 in List-3. I.e C2 parser assumes next bci of 23: ifeq is 26.

Current C2 associates the unstable_if trap at line 8 with bci: 23. We think it is too conservative. It is safe to use bci: 34 instead because the conditional branch at bci: 23 is supposed to take after deoptimization, or the deoptimization should not happen in the first place.

Variable ii is local#4 defined at bci: 20 in List-2. It’s noteworthy it is live at bci: 23 (highlighted) but dead at bci: 34.

  1 public int testMethod() {
  2   int sum = 0;
  3   for (int i = 0; i < data.length; i++) {
  4       Integer ii = Integer.valueOf(data[i]);
  5       if (cond) {
  6           sum += data[i]; // from sum += ii.intValue();
  7       } else {
  8           uncommon_trap(Reason_unstable_if, Action_reinterpreter, ...);
  9       }
 10   }
 11   return sum;
 12 }

 List-3:  psuedeo code of testMethod in optimizer

The call of Integer.valueOf() at line 4 should get removed because no one uses local ii anymore after AggressiveUnboxing (Line 6). Unfortunately, it is not the case in current C2. Because parser uses bci: 23 to collect liveness for the uncommon_trap node. It references ii in its debuginfo section. It is the very reference prevents late-inliner from eliminating the function call of Integer.valueOf().

Solution

The key observation is that JVM will take the other path when deoptimization does happen. We could use the liveness of destination instead of the conditional branch. If the liveness set of new location is smaller than before, we allow more optimization in C2 and save space in debuginfo.

Parser records the speculative bci for all unstable_if traps. The speculative bci points the path which is unlikely to be taken. We use the liveness of the speculative bci to kill dead locals for an unstable_if trap. Eg. the original bci is 23 in List-2. The speculative bci is 34, where ii has been dead. C2 removes local ii from uncommon_trap call’s inputs. This unlocks late-inliner to delete the function call of Integer.valueOf().

Because we keep the original bci for deoptimization, interpreter needs to evaluate the conditional branch again, eg. 23: ifeq 34 in List-2. It is possible that the orignal bytecode (ifnull/ifnonnull, if_acmpne/eq) reference objects on operand stack, thus we conservatively prevent operands of the bytecode from being killed.

Evaluation

We ran the microbenchmark of JDK-8261137 and got 9x speedup on x86_64.

$java -jar ./StringFunc-jmh.jar -wi 5 -f 1 -i 10  com.amazon.jdkteam.brownbag.MyBenchmark.testMethod
Benchmark               Mode  Cnt  Score   Error  Units
MyBenchmark.testMethod  avgt   10  3.679 ± 0.014  us/op
$java -jar ./StringFunc-jmh.jar -wi 5 -f 1 -i 10  -jvmArgs " -XX:+UnlockDiagnosticVMOptions -XX:-AggressiveLivenessForUnstableIf" com.amazon.jdkteam.brownbag.MyBenchmark.testMethod
Benchmark               Mode  Cnt   Score   Error  Units
MyBenchmark.testMethod  avgt   10  34.464 ± 0.855  us/op

Follow-up

An unstable if trap is trivial if it can not prune any code, or the pruned basic block is small. eg. the unstable if trap generated at line 8 of List-3 is trivial because it masks out nothing. The else-section of original program is empty. The trivial unstable_if traps not only complicate codeshape but also consume codecache.

I would like to record the speculative basic block as well. C2 can determine whether an unstable if is trivial by its definition after parsing. The challenging part is how to merge the block of conditional branch to the speculative block. It is an open question.


Appendix:

References

  1. JBS issue of this work: https://bugs.openjdk.java.net/browse/JDK-8286104
  2. JBS issue of the follow-up: https://bugs.openjdk.java.net/browse/JDK-8287385

Why do not shift bci for the unstable_if trap?

We can’t update MDO correctly if we do so. Secondly, unstable_if trap may be shared by other code. see the comment from Vladimir Kozlov. openjdk/jdk#8545 (comment)

How can I guarantee that I don’t mess with breakpoint or jvmti code?

Breakpoints can only be placed via JVMTI. When it is placed, all dependent methods are supposed to be destroyed.

void BreakpointInfo::set(Method* method) {
#ifdef ASSERT
  {
    Bytecodes::Code code = (Bytecodes::Code) *method->bcp_from(_bci);
    if (code == Bytecodes::_breakpoint)
      code = method->orig_bytecode_at(_bci);
    assert(orig_bytecode() == code, "original bytecode must be the same");
  }
#endif
  Thread *thread = Thread::current();
  *method->bcp_from(_bci) = Bytecodes::_breakpoint;
  method->incr_number_of_breakpoints(thread);
  {
    // Deoptimize all dependents on this method
    HandleMark hm(thread);
    methodHandle mh(thread, method);
    CodeCache::flush_dependents_on_method(mh);
  }
}

When JDWP is on, JVMTI feature should_retain_local_variables is on. This refrain from liveness compuating from CFG. It simply marks all local variables live.

MethodLivenessResult ciMethod::liveness_at_bci(int bci) {
  if (CURRENT_ENV->should_retain_local_variables() || DeoptimizeALot) {
    // Keep all locals live for the user's edification and amusement.
    MethodLivenessResult result(_max_locals);
    result.set_range(0, _max_locals);
    result.set_is_valid();
    return result;
  }
  return raw_liveness_at_bci(bci);
}

‘fold-compares’ transformation

This transformation fuses 2 signed integer comparisons into one unsigned one. If both 2 IfNode has uncommon_traps, it will use the dominating one to cover 2 IfNodes. It conflicts with my assumption presented in this article, here is why.

eg. see the 2 comparisons at line 4.

  1 public static Short valueOf(short s) {
  2      final int offset = 128;
  3      int sAsInt = s;
  4      if (sAsInt >= -128 && sAsInt <= 127) { // must cache
  5          return ShortCache.cache[sAsInt + offset];
  6      }
  7      return new Short(s);
  8  }

CmpI(le, sAsInt, 127) && !CmpI(lt, sAsInt, -128) => CmpI(le, sAsInt, 127) => node 39 CmpI(lt, sAsInt, -128) => node 25

Screen Shot 2022-05-08 at 7 29 56 PM

fold-compares transforms 2 integer comparisons into one unsigned comparison.

sAsInt >= -128 && sAsInt <= 127
=>
0 <= sAsInt + 128 < 256
=>
CmpU(lt, sAsInt + 128, 257)

That’s node 59 is the new unsigned comparison. Screen Shot 2022-05-09 at 9 51 59 AM


------------------------ OptoAssembly for Compile_id = 26 -----------------------
#
#  java/lang/Short:exact * ( int )
#
000     N1: #   out( B1 ) <- in( B3 B2 )  Freq: 1

000     B1: #   out( B3 B2 ) <- BLOCK HEAD IS JUNK  Freq: 1
000     # stack bang (144 bytes)
        pushq   rbp     # Save rbp
        subq    rsp, #32        # Create frame

00c     movl    R11, RSI        # spill
00f     addl    R11, #128       # int
016     cmpl    R11, #256       # unsigned
01d     jnb,us  B3  P=0.000000 C=5375.000000

01f     B2: #   out( N1 ) <- in( B1 )  Freq: 1
01f     movslq  R10, RSI        # i2l
022     movq    R11, narrowoop: java/lang/Short:NotNull:exact *[int:256]<ciObjArray length=256 type=<ciObjArrayKlass name=[Ljava/lang/Short; ident=1271 address=0x00007fc0f83072c0> ident=1274 address=0x00007fc0f8307d70> *    # ptr
02c     movl    R10, [R11 + #528 + R10 << #2]   # compressed ptr
034     decode_heap_oop_not_null RAX,R10
038     addq    rsp, 32 # Destroy frame
        popq    rbp
        cmpq     rsp, poll_offset[r15_thread]
        ja       #safepoint_stub        # Safepoint: poll for GC

04a     ret

04b     B3: #   out( N1 ) <- in( B1 )  Freq: 4.76837e-07
04b     movl    [rsp + #0], RSI # spill
04e     movl    [rsp + #4], RSI # spill
052     movl    RSI, #-195      # int
057     call,static  wrapper for: uncommon_trap(reason='unstable_fused_if' action='reinterpret' debug_id='0')
        # java.lang.Short::valueOf @ bci:9 (line 277) L[0]=rsp + #0 L[1]=_ L[2]=rsp + #4 STK[0]=rsp + #0 STK[1]=#-128
        # OopMap {off=92/0x5c}
05c     stop    # ShouldNotReachHere

We have to give up our optimization when we detect fold-compares. It could happens on Short.valueOf and Enums. Because Integer.valueOf() doesn't use literals for the range check, it is not applicable for fold-compares.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment