albertnetymk/cross-region-invariants.md

## cross-region-invariants.md

      
    Raw
  

              cross-region-invariants.md
            
          
    Region types in G1

Let's introduce the region types used in G1:
eden: for newly added objects
from-survivor: for objects in to-survivor from previous young-GC
to-survivor: live objects in eden/from-survivor are copied here if they have not lived long
old: live objects in eden/from-survivor are copied here if they have lived long
After a young-GC, eden/from-survivor regions are reclaimed (if no evac-failure), and to-survivor regions become from-survivor regions. eden/from-survivor/to-survivor regions constitute young-gen and all old regions form the old gen. (There are also Humongous and Archive regions, but we can ignore them for this discussion.)
Cross-region invariants

There are two kinds of cross-region pointers that need to be tracked:

from old-gen to young-gen: these pointers are part of roots during young-GC
from old-gen to old-gen when the destination region requires remset update: during mixed-GC (some Old regions are collected in addition to regions in young-gen), those to-be-collected Old regions must know all incoming from other regions to the current region) pointers, and their remset must be kept updated.

Maintain cross-region invariants during young-GC

At the beginning of a young-GC, the whole young-gen, consisting of only eden and from-survivor regions, is selected as the collection set (cset). During the GC, we try to evacuate all live objects in cset to to-survivor or old regions, depending their age (how long they have lived).
It's a classic copying (aka scavenge) GC algorithm: breadth-first traversal with a stack to hold yet-to-scan objects. Lying in the core of algorithm is the closure (do_oop_evac), applied to all oop locations pointing into the cset. The part unique to G1 is the react_to_new_pointer call, which, as its name implies, reacts to the newly established pointer and maintains the aforementioned cross-region invariants. There are two situations where react_to_new_pointer is called:

finding pointers pointing into cset: after the destination object has been evacuated from cset to another region at the end of do_oop_evac.
finding pointers pointing outside of cset: the destination object will not be moved (because it's outside of cset), so we can call react_to_new_pointer when we identify those pointers, during iterating an object (else branch of iterate_fields).

do_oop_evac(oop* p) {
  obj = load(p);
  
  if (obj.is_forwarded()) {
    new_obj = obj.forwarded_to();
  } else {
    new_obj = alloc(obj.size);
    if (new_obj == null) {
      // evac-fail
      iterate_fields(obj);
    } else {
      copy(obj, new_obj);
      obj.set_forwarded(new_obj);
      iterate_fields(new_obj);
    }
  }
  store(p, new_obj);
  
  react_to_new_pointer(p, new_obj);
}

iterate_fields(obj) {
  for (f in obj.fields) {
    if (in_cset(load(f))) {
      push(f);
    } else {
      react_to_new_pointer(f, load(f));
    }
  }
}

void react_to_new_pointer(p, new_obj) {
  // decide if need to do sth for this new pointer, p -> new_obj
  
  if (in_same_region(p, new_obj)) {
    // not cross-region; ignore
    return;
  }
  
  if (in_to_survivor(p)) {
    // from young-gen to XXX; ignore
    return;
  }
  
  if (in_cset(new_obj)) {
    // evac-fail; the destination region (where new_obj resides) will become Old, and its remset will be rebuilt from scratch.
    // Therefore, we don't recording pointers pointing into that region.
    return;
  }
  
  if (containing_region(new_obj)->need_remset_update()) {
    // record this pointer
  }
}
After a young-gc, there are three possibilities regarding the region type: to-survivor, old, regions in cset (eden or from-survivor, due to eval-fail). In the following, we enumerate every from-to pairs of each possibility, and decide whether we need to record such cross-regions pointers and why.


from
to
action & why


to-survivor
to-survivor/old/cset
no; pointers residing in young-gen are not recorded, early-return of in_to_survivor(p)


old/cset
to-survivor
yes; eval-fail regions become Old, and old-gen to young-gen pointers are recorded; all to-survivor regions have need_remset_update == true


old/cset
old
yes or no; depending to-region's need_remset_update


old/cset
cset
no; remset for eval-fail regions are rebuilt from scratch (early-return of in_cset(new_obj))


The pseudocode is structured for maximal readability, and it's quite different from the actual code, which contains many optimizations and various low-level details. Hopefully, walking through this pseudocode can help readers form a mental picture of the overall concept, providing a smoother descend to the G1 codebase.
from	to	action & why
to-survivor	to-survivor/old/cset	no; pointers residing in young-gen are not recorded, early-return of `in_to_survivor(p)`
old/cset	to-survivor	yes; eval-fail regions become Old, and old-gen to young-gen pointers are recorded; all to-survivor regions have `need_remset_update == true`
old/cset	old	yes or no; depending to-region's `need_remset_update`
old/cset	cset	no; remset for eval-fail regions are rebuilt from scratch (early-return of `in_cset(new_obj)`)