JohnTortugo/EscapeAnalysis.md

## EscapeAnalysis.md

      
    Raw
  

              EscapeAnalysis.md
            
          
    HotSpot Escape Analysis and Scalar Replacement Status

Introduction

Escape Analysis (EA) is a compiler analysis to answer the following questions for a given object O, method M, and thread T.

Escapes(O, M): Does object O outlive the execution of method M? This may happen for instance if a reference to O is assigned to a global variable.
Escapes(O, T): Does object O escape the current execution thread? This may happen for instance if a reference to O is copied to another thread.

Answers to these questions enable a compiler to perform a few highly effective optimizations, for instance, Stack Allocation (SA), Lock Elision (LE), and Scalar Replacement of Aggregates (SRA). Note that,
HotSpot does not perform SA.
Scalar Replacement of Aggregates, or simply Scalar Replacement, is an optimization that can decompose an object into its individual components, most importantly the instance fields of the object. In practice, SRA replaces the access to object fields with local variables, effectively removing the need for allocating the object.
This report focuses on identifying improvement opportunities to the Escape Analysis and Scalar Replacement implementation in OpenJDK HotSpot.
Escape Analysis and Scalar Replacement in HotSpot

HotSpot Escape Analysis is based on the flow-insensitive algorithm described in "Escape Analysis for Java, Choi et. al". The next sections of this document present some scenarios that the current EA and/or SRA implementation in HotSpot does not handle ideally.
Flow-Insensitive Analysis

The algorithm currently implemented in HotSpot is flow-insensitive, which means that if an object escapes in at least one of the method’s control flow paths the algorithm conservatively assumes the object escapes the method. Below is an example where object "o" only escapes when the "if" condition is true, but the algorithm deems the object as escaping.
public int conditionalEscape(boolean cond1) {   
  MyPair o = new MyPair(/*p1=*/1, /*p2=*/2); 
   
  if (cond1) {     
    EscapeAnalysis.global_escape = o; // "o" escapes to  
                                      // global static  
                                      // variable   
  } 
   
  return o.getP1(); 
} 

Code Example 1: HotSpot conservatively assumes the object always escapes the method.
One way that HotSpot could improve the code would be as shown below, where the object is only allocated when the “if” condition is satisfied. HotSpot is currently not capable of doing this, but the Graal compiler is able to perform this optimization.
public int conditionalEscape(boolean cond1) {   
  if (cond1) {     
    MyPair o = new MyPair(/*p1=*/1, /*p2=*/2);  
    EscapeAnalysis.global_escape = o; // "o" escapes to  
                                      // global static  
                                      // variable   
  } 
 
  return 1; 
}

Code Example 2: Optimized version of Example 1 when the compiler can perform Partial Escape Analysis.
Non-Inlined Methods

EA/SRA is heavily dependent on methods pertaining to candidate objects being inlined, however, the Inlining and EA/SRA compiling phases in HotSpot do not communicate to help each other. In the following example, the object "o" might escape or not depending on the inlining decision made for otherMethod.
public int conditionalEscape(boolean cond1) {   
  otherMethod(cond1);   
  MyPair o = new MyPair(/*p1=*/1, /*p2=*/2);   
  return o.getP1(); 
}

Code Example 3: Scalar Replacement is prevented if inlining doesn’t happen.
If the Inlining phase decides to inline otherMethod and that method is reasonably big, then it is possible that getP1 will not be inlined – to prevent the body of conditionalEscape becoming too large. However, that decision will prevent the object from being scalar replaced, as the object will be considered to escape the method and because SRA depends on the inlining of getP1.
Control Flow Merges

The example below shows another limitation of the current EA implementation. Objects will not be scalar replaced if there is a merge point where it is undefined which object is referenced. In the example, there are references to object fields after the "if" where the used object could be either the one created before the "if" (when “cond” is false) or the one created inside the "if". Note that the Graal Compiler can perform scalar replacement in this case.
public int escapeAnalysisFails(bool cond, int x, int y) {   
  MyPair o = new MyPair(/*p1=*/0,/*p2=*/0);   
   
  if (cond) {     
    o = new MyPair(x, y);   
  } 
 
  return o.getP1(); 
}

Code Example 4: The object is not scalar replaced if it’s part of a control/data flow merge.
Re-use Stack Slots

Very often loops are used to perform cumulative computations for which intermediate results are stored in temporary objects. In such situations, we say that a result (i.e., an object) of an iteration carries over to the next iteration. When such a situation happens it seems that the compiler does not perform scalar-replacement on the object that is carried over across iterations. Consider the example below:
public int loopCarried(int x, int y) {   
  MyPair r = new MyPair(/*p1=*/x, /*p2=*/y);     
   
  for (int i=0; i<100_000_000; i++) {     
    MyPair a = new MyPair(i, i+1);         
    MyPair b = new MyPair(i, i-1);         
    MyPair t = r.sum(a).sum(b);         
    r = t;   
  } 
     
  return r.getP1() + r.getP2(); 
} 

Code Example 5: Loop-carried temporary objects aren’t eliminated.
The copy of object "t" to "r" can be removed if only one object is used. Moreover, after Inlining and Scalar Replacement all the objects could be scalar replaced.
Interprocedural Analysis

Current interprocedural Escape Analysis in C2 is based on the paper “Escape Analysis in the Context of Dynamic Compilation and Deoptimization” by Kotzmann and Mossenbock. The analysis is done at the bytecode level (i.e., not on the Ideal IR Graph) whenever a static method is being called. Virtual methods are not analyzed, references consumed/returned by such methods are conservatively assumed to escape the caller method.
The implementation has a threshold on the maximum size of the method being analyzed. Methods larger than 150 bytes are not analyzed and objects consumed/returned by such methods are considered to global escape. In the example below the object “o” will be considered to escape the method because the interprocedural analysis will bail out while analyzing the body of “bigMethod.”
public static int interprocedural(int x, int y) { 
  MyPair o = new MyPair(x, y); 
  o = bigMethod(o); 
  return o.p1; 
} 
 
public static MyPair bigMethod(MyPair p) { 
  switch (p.p1) { 
    case 0: 
        System.out.println(“Case is 0”); break; 
    case 1: 
        System.out.println(“Case is 1”); break; 
   /* ….. */ 
    case 10: 
        System.out.println(“Case is 10”); break; 
    default: 
        System.out.println(“Case is N”); 
  } 
}

Code Example 6: The interprocedural analysis is based on static analysis of Bytecodes.
Top-Down Quantitative Analysis

This section presents our initial investigation on the effectiveness of the current EA and SRA implementation in HotSpot. The experiment was performed using the latest version of the Renaissance and DaCapo Benchmark suites and using the “tip” version of JDK. Please note that the experiments only consider the methods compiled by C2 and do not distinguish between recompilations of the same method or OSR compilations.
Figure 1 shows the percentage of all methods compiled by C2 that are candidates for Scalar Replacement and Lock Elision. Only methods that allocate objects, make use of locks, or make calls to Autoboxing methods are considered as candidates. As a point in case, during execution of the “movie-lens” program, 41% of the methods are going to be analyzed by Escape Analysis and potentially optimized using LE and/or SRA. On the other hand, only 19% of the C2 compiled methods in Avrora are considered candidates. Overall, about 35% of the methods compiled by C2 are considered candidates.

Figure 1: Percentage of Candidate Methods per Benchmark.
Figure 2 shows a breakdown of the candidates shown in Figure 1 according to the outcome of the optimization process. The classification is as follows.

Q1: Percentage of the candidate methods where at least one object was scalar replaced or lock elided.
Q2: Percentage of candidates discarded because of constraints imposed by C2, e.g., the data/control flow merge point described in the previous section.
Q3: Percentage of candidate methods for which Escape Analysis deemed that all objects escape the method.

Analyzing Figure 2 we see that 22% of the candidate methods in the movie-lens benchmark had at least one object scalar replaced or elided. In that same benchmark, 7% of the candidates were ignored due to C2 constraints and 71% of the candidates were discarded because Escape Analysis identified that all objects escape the method. Overall, around 80% of the candidates are discarded because EA concludes that all objects escape and an additional 7% are discarded due to C2 restrictions. Therefore, only about 13% of the candidates have some method scalar replaced.

Figure 2: Which Stage of the Compilation Exclude the Candidates for EA & LE.
Figure 3 shows the percentage of candidate methods eliminated by Escape Analysis separated by the step of EA that ruled out that candidate. The percentages are relative to the number of candidate methods eliminated by EA – the ones ruled out by HotSpot constraints are not included in the calculation. The EA steps considered are the following:

Step 1: Initial analysis graph construction.
Step 2: The “escape” status of objects in the analysis graph is propagated to fields and referenced objects.
Step 3: Objects in the EA Graph are visited once more, and their "escape" statuses are refined.

The results show that sometimes a sizable fraction of the candidates is ruled out just after the analysis graph is constructed. For instance, 25% of the candidate methods in the Reactors benchmark and 24% in Xalan were ruled out during the initial graph construction. However, overall, most of the candidates are excluded by the refinement steps – around 80% of the candidates.

Figure 3: Which Stage of Escape Analysis Excluded the Candidates.
Conclusion

Please note: the conclusions presented below are preliminary and based solely on the experiments presented in this document, further experiments are necessary to investigate and/or validate some of these points in real-world applications.
There are certainly some opportunities for improvement in HotSpot’s EA and SRA implementation. The quantitative analysis presented shows that, for the two benchmark suites analyzed, most of the methods did not have any objects scalar replaced. Further investigation is necessary to identify the specific reasons. The experiments show that around two thirds of the methods compiled by C2 are not even considered for SRA and LE. That might be a result of coding practices in Java, or a characteristic of the benchmarks studied – further investigation is needed.
Around 80% of the SRA/LE candidate methods are ruled out because EA considers all objects to escape the method. We suspect that one of the reasons for this high number is the flow-insensitive analysis which consider an object to escape the method if it escapes in any control-flow path. The Graal Compiler implements a flow-sensitive Escape Analysis (also called Partial Escape Analysis) and there are reports showing that it produces double-digit performance improvements on benchmarks - especially for Scala-based programs. Implementing Partial Escape Analysis in HotSpot might result in the same kind of improvement, however that is probably one of the biggest changes among all improvement possibilities presented.
Improving the Inlining algorithm is another opportunity to explore for performance improvements, especially if done in a way that it can integrate better with other analyses and optimization phases.
The restrictions imposed by C2 as conditions for scalar replacing objects ruled out only about 7% of the candidate methods - the control flow merge restriction is the one that excludes most methods. Therefore, these restrictions are not the major limiting factor preventing objects from being SRA/LE. Nevertheless, they probably will require changes when/if Partial Escape Analysis is implemented.
It is worth mentioning that most, if not all, of the issues mentioned in the "Escape Analysis and Scalar Replacement in HotSpot" section have been known to the OpenJDK community for a few years.
Future Work


Manually modify some benchmarks to mimic what we would like C2 to be able to do automatically and measure the performance and effort ratio.
Investigate why just one third of the methods are considered as candidates for EA/SRA.
Quantify the percentage of candidates being ruled out by specific parts of Escape Analysis.
Separate the Scala-based benchmarks from the Java-based benchmarks and run the “Top-down Quantitative Analysis” again.
Run some diagnostics on customer workloads to understand the main reasons for failed EA/SRA

Questions for the Community


Have we overlooked any other opportunities to improve Escape Analysis and/or Scalar Replacement?


Of the opportunities identified, what are the estimated relative costs? Estimated relative benefits? Estimated ranking of cost-benefit ratios?


Which ones have been tried before? What was learned?


Which ones have been avoided and why?


Do the numbers presented look right to you and/or do they match your expectation/experience?


In your opinion/experience what’s the reason for just one-third of the methods being considered for SRA?


Do you think the flow-sensitive analysis is the reason that most candidate methods are being discarded? If not, what do you think is the cause?


Do you think the work being done in the Valhalla project is a deterrent to improving EA in HotSpot?


References


Partial Escape Analysis and Scalar Replacement for Java. Lukas Stadler.
Escape Analysis for Java. Jong-Deok Choi.
Seeing Escape Analysis Working. Chris Seaton.
JVM Anatomy Quark #18: Scalar Replacement. Aleksey Shipilev.
Current State of JVM Escape Analysis and downstream optimizations. Charlie Gracie.
RFC for C2 Partial Escape Analysis. Jason Tatton from Amazon.
Escape Analysis and Stack Allocation. Vladimir Kozlov.
Reducing OpenJDK Java Garbage Collection Times with Stack Allocation. Nikola Grcevski.
Java Bug System Escape Analysis Work Items.
Scalar replacement and escape analysis fails for simple test case
Escape Analysis does not re-use stack slots when chaining allocations
Add experimental feature to eliminate non-escaping boxing objects
Escape Analysis in the Context of Dynamic Compilation and Deoptimization. Kotzmann et. al.