Skip to content

Instantly share code, notes, and snippets.

@navyxliu
Last active October 6, 2023 19:21
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save navyxliu/d2602d4a4df98f2fb7982cbd3e5ab6eb to your computer and use it in GitHub Desktop.
Save navyxliu/d2602d4a4df98f2fb7982cbd3e5ab6eb to your computer and use it in GitHub Desktop.
Passive materialization in C2 Partial Escape Analysis

Passive materialization in C2 Partial Escape Analysis

Problem Statement

Besides escaping points, C2 Partial Escape Analysis(PEA) can optionally materialize the object at the convergence block when any of its predecessors has already materialized it. It is referred to as passive materialization. It is optional because C2 currently skips it and programs are still correct. Here is an example. Object pt is escaped at line 6.

  1  public double test(boolean cond, double x, double y) {
  2      Point pt = new Point(x, y);
  3      pt.x = 3.0d;
  4
  5      if (cond) {
  6        _cache = pt; // escaping point. pt1 = Mat(pt)
  7      }
  8      // pt2 = phi(pt, pt1)
  9      return pt.length();
 10  }

 List-1: Example-1

List-2 is the result after PEA with passive materialization. When C2 Parse merges predecessors at line 11, it induces passive materialization at line 9. This causes the original pt dead after convergence. Without passive materialization, we reply on C2 optimizer to eliminate the object pt, or we will have partial redundant allocation. In List-1, the generated code will allocate two objects if cond is true and optimizer does not eliminate pt.

  1  public double test(boolean cond, double x, double y) {
  2      Point pt = new Point(x, y);
  3      pt.x = 3.0d;
  4
  5      if (cond) {
  6        pt1 = Mat(pt);
  7        _cache = pt1; // escaping point.
  8      } else {
  9        pt2 = Passive_Mat(pt);
 10      }
 11      pt3 = Phi(pt1, pt2);
 12      return pt3.length();
 13  }

 List-2: Example-1 with Passive Materialization

After JDK-8287061, C2 optimizer can split reducible PHIs, so pt in List-1 is indeed eliminated. Unfortunately, it is not always the case. We found that optimizer can’t eliminate the original object in the following 2 scenarios. They contribute the performance regression we’ve detected so far.

  1. the original object is escaped after merging.
  2. cyclic object graph.
{ // case-1
    static Object _global;
    static Object _global2;

    public static void test(boolean cond) {
        Object foo = new Object();
        if (cond) {
            _global = foo;
        }
        _global2 = foo;
    }
}

{ // case-2
    static class Node {
        Node next;
    }

    public static void foo() {
        Node a = new Node();
        a.next = a;  // form a single node cycle. marked NSR by EA.
        // Either a is escaped or not, it stays.
    }
}

List-3: Two cases that C2 Optimizer can't eliminate the original object.

By implementing passive materialization, we can safely mark the original obsolete. In Case-1, foo will become dead and get eliminated by C2 optimizer. In Case-2, we will enhance Scalar replacement to purge objects if they are still virtual after PEA. We can prove that materialization and passive materialization kill the original object at the merging point using SSA property.

The trivial implementation of passive materialization is by its definition. Every time an incoming edge induces passive materialization, PEA materializes the object for the corresponding blocks. We implement PEA in C2 Parse, so the full flow graph has not been available at merge points. The merging process is incremental. Trivial implementation is facing the following difficulties in C2 Parse.

  1. Tracking compile-time states
  2. Code bloat issue

Let me elaborate the problems with an example of exception:

  1   void foo() throws Exception {
  2     if (flag) { // a global
  3         throw new Exception("ghost");
  4     }
  5   }
  6
  7   void blackhole(Object obj) throws Exception {}
  8
  9   void test() {
 10     Object obj = new Object();
 11     try {
 12         foo();
 13         foo();
 14         blackhole(obj);  // materialize here
 15         foo();
 16     } catch (Exception e) {
 17         throw new RuntimeException(obj.toString());
 18     }
 19   }

List-4: Example with exceptions

foo() throws two exceptions. The first one is a Throwable at bci 7 implied by bytecode new*. The second one is an explicit Exception object thrown at bci 16 in List-5.

  void foo() throws java.lang.Exception;
    Code:
       0: aload_0
       1: getfield      #7                  // Field flag:Z
       4: ifeq          17
       7: new           #13                 // class java/lang/Exception
      10: dup
      11: ldc           #15                 // String ghost
      13: invokespecial #17                 // Method java/lang/Exception."<init>":(Ljava/lang/String;)V
      16: athrow
      17: return

 List-5: the bytecode of method foo().

* JVM Spec doesn’t specify that. JLS specifies OOME but current C2 throws a Throwable here.

Here is the trace of C2 Parse. B4 is the exception handler at line 16. B4 is merged 7 times from B1. Figure-1 depicts how parse merges B1 to B4. The red line spans from the invocation of blackhole() at bci 18 where the object is materialized. It is an escape point because blackhole() is not inlined and obj is argument-escaped. We highlight path 6 which induces passive materialization in red.

passive_mat_ex

Figure-1: the diagram of merging point

Figure-2 is a part of ideal graph when parse merges B4 at path 6. 158 Region is the basic block of exception handler. 154 SafePointNode is the map associated with 158 Region. 365 PHI is merging node of Object obj at line 10. #5 input is about to change to the materialized object. As a result, we have to materialize input #2~#5 as well.

Before_Materialization

Figure-2: the ideal graph before passive materialization.

Problem 1: tracking compile-time states

PEA materialization needs to tracks 3 compile-time states. They are a tuple (Control, Exception, Memory). When PEA materializes an object, new nodes consume previous state tuple and yields a new state tuple. Passive materialization is supposed to happen in its predecessors instead of the target block. Exception path is even trickier because it is span from a certain bytecode instead of the end of a block. Currently, C2 Parse doesn’t preserve the state tuples of predecessors.

Like Figure-1, it’s possible that B4 has merged multiple predecessors before it encounters a red edge. We need to track the compile-states for path 2~5. Passive materialization nodes have external effects and must be merged to Region’s 154 SafePointNode. We need to merge the new exception into 191 PhiNode and the new memory into 136 MergeMemNode.

Problem 2: duplication code in materialization

When passive materialization takes places in path #6, PEA has to materialize all prior predecessors. In Figure-1, we have to materialize the object for all predecessors from path #2 to path #5. The trivial approach is to access those predecessors and materialize them one by one. We end up with 4 copies of materialization code. To avoid duplication, we need to combine those predecessors with virtual states and only emit one copy of passive materialization for them.

In Figure-1, dotted lines indicate that they have not been seen when the passive materialization takes place. Because we merge allocation state incrementally, we need to consider how to merge additional paths such as path #7 and #8. PEA has to materialize the upcoming virtual objects separately. This is another source of duplication.

Each allocation is expanded multiple nodes to include fast-slow paths in presence of TLAB and materialization needs to initialize the fields of an object, so each materialization requires a number of nodes. Since materialization leads to code bloat issue, we need to avoid from emitting identical code in passive materialization.

Solution

The key observation is that PEA needs to preserve compile-time states to address both prior problems. In problem 1, we need to control, exception and memory states for materialization. In problem 2, it is possible to avoid duplicating by combining the state tuples of multiple predecessors and have only one materialization for them.

In C2 Parse, a SafepointNode is attached to each RegionNode. It is special because it has no use. It refers to as ‘map’. A RegionNode denotes a basic block. The special SafepointNode roles as a local symbol table in current scope. The first 5 inputs of SafepointNode is (Control, Exception, Memory, FramePtr, ReturnAddr). Those SafePointNodes are rolling. Once Parse has completed a block, it transfers the map to successor block. That is to say, SafePointNodes are not preserved. We can’t revisit them via CFG.

Shim Region

The basic idea is to create one artificial block for each object of passive materialization. We attach a stationary SafePointNode to preserve compile-time state and Virtual State for it. It is merge-able for upcoming predecessors. The block is referred to as ‘Shim’. A shim is a Region node in C2 IR. The inputs of it are the predecessors whose object obj are virtual. In other words, they are subjects of passive materialization. shim_structure

A stationary SafePointNode sticks with a shim and it preserves the compile-time state. The SafePointNode also associates with an AllocateState. There is only one object inside of the AllocateState and it is always virtual. Field initialization is dashed because we can’t emit field initialization nodes until we have parsed all predecessors. PEA maintains a list of shim regions for post-process after C2 Parse.

Our approach has 3 steps. We construct a shim region for an object in the first time we need to do passive materialization for it. We will keep merging incoming predecessors if the object is virtual. When C2 Parse has completed, we revisit the shim regions and emit field initialization.

1. Construction

Passive materialization only happens at merging points. C2 Parse merges an incoming block to its target block. If passive materialization never happens before, there are two cases.

  1. object is Escaped in the incoming block(depicted in red line). It is Virtual in all prior predecessors (depicted in black line).
  2. object is Virtual in the incoming block. however, it is Escaped in all prior predecessors.

Either way, we create a shim region to cover blocks where the object were virtual.

We passively materialize the object in the shim region and replace the corresponding inputs of phi node. The result of passive materialization is denoted using P-Mat(obj). It emits AllocateNode as same as materialization but puts its field initialization on hold. It is the key to keep a shim mergeable. Because a stationary SafePointNode attaches to a shim region, we keep track of the virtual state of the object.

After merging, the object is Escaped in the Region Node.

shim_construction

2. Merge to an existing shim

When passive materialization happens, PEA needs to check whether a shim region exists. If so, we know that this is not the first time. It is only possible that the object is virtual in the incoming predecessor*. Our goal is to merge incoming predecessor to shim region. Either the incoming block appears in the left or the right, we combine it with the shim. We also merge AllocateState as well and it will update the fields of object.

shim_maintain

3. Initialize passively materialized objects

After C2 Parse, those passively materialized objects are allocated but have not been initialized yet. Since we have traverse all live blocks, the Virtual State of an object has been finalized. Those objects form a DAG so we can traverse them in topological order. Every time we create a new shim, we put it into worklist. Each shim region still maintains <control, exception, memory> so we generate stores nodes for an object and merge compile-time state to the target region after then. We can use Depth-First-Seach(DFS) to process the worklist.

Wrap-up

This whitepaper proposes a new way to support passive materialization in C2 Parse. It is incremental so it can be done in one pass. It avoids code duplication by combining all possible predecessors. With passive materialization, we expect to eliminate the partial redundant allocation issue we found. It also opens up the new opportunities such as cyclic object graph elimination.

* Proof of contradiction. If shim exists, object must has been Escaped in the target block. If the object was Escaped in the predecessor, the passive materialization would not occur. Q.E.D.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment