Skip to content

Instantly share code, notes, and snippets.

@albertnetymk
Last active September 22, 2021 14:28
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save albertnetymk/dcb35846475151664497c43941eec5bb to your computer and use it in GitHub Desktop.
Save albertnetymk/dcb35846475151664497c43941eec5bb to your computer and use it in GitHub Desktop.

Region types in G1

Let's introduce the region types used in G1:

eden: for newly added objects from-survivor: for objects in to-survivor from previous young-GC to-survivor: live objects in eden/from-survivor are copied here if they have not lived long old: live objects in eden/from-survivor are copied here if they have lived long

After a young-GC, eden/from-survivor regions are reclaimed (if no evac-failure), and to-survivor regions become from-survivor regions. eden/from-survivor/to-survivor regions constitute young-gen and all old regions form the old gen. (There are also Humongous and Archive regions, but we can ignore them for this discussion.)

Cross-region invariants

There are two kinds of cross-region pointers that need to be tracked:

  1. from old-gen to young-gen: these pointers are part of roots during young-GC
  2. from old-gen to old-gen when the destination region requires remset update: during mixed-GC (some Old regions are collected in addition to regions in young-gen), those to-be-collected Old regions must know all incoming from other regions to the current region) pointers, and their remset must be kept updated.

Maintain cross-region invariants during young-GC

At the beginning of a young-GC, the whole young-gen, consisting of only eden and from-survivor regions, is selected as the collection set (cset). During the GC, we try to evacuate all live objects in cset to to-survivor or old regions, depending their age (how long they have lived).

It's a classic copying (aka scavenge) GC algorithm: breadth-first traversal with a stack to hold yet-to-scan objects. Lying in the core of algorithm is the closure (do_oop_evac), applied to all oop locations pointing into the cset. The part unique to G1 is the react_to_new_pointer call, which, as its name implies, reacts to the newly established pointer and maintains the aforementioned cross-region invariants. There are two situations where react_to_new_pointer is called:

  1. finding pointers pointing into cset: after the destination object has been evacuated from cset to another region at the end of do_oop_evac.
  2. finding pointers pointing outside of cset: the destination object will not be moved (because it's outside of cset), so we can call react_to_new_pointer when we identify those pointers, during iterating an object (else branch of iterate_fields).
do_oop_evac(oop* p) {
  obj = load(p);
  
  if (obj.is_forwarded()) {
    new_obj = obj.forwarded_to();
  } else {
    new_obj = alloc(obj.size);
    if (new_obj == null) {
      // evac-fail
      iterate_fields(obj);
    } else {
      copy(obj, new_obj);
      obj.set_forwarded(new_obj);
      iterate_fields(new_obj);
    }
  }
  store(p, new_obj);
  
  react_to_new_pointer(p, new_obj);
}

iterate_fields(obj) {
  for (f in obj.fields) {
    if (in_cset(load(f))) {
      push(f);
    } else {
      react_to_new_pointer(f, load(f));
    }
  }
}

void react_to_new_pointer(p, new_obj) {
  // decide if need to do sth for this new pointer, p -> new_obj
  
  if (in_same_region(p, new_obj)) {
    // not cross-region; ignore
    return;
  }
  
  if (in_to_survivor(p)) {
    // from young-gen to XXX; ignore
    return;
  }
  
  if (in_cset(new_obj)) {
    // evac-fail; the destination region (where new_obj resides) will become Old, and its remset will be rebuilt from scratch.
    // Therefore, we don't recording pointers pointing into that region.
    return;
  }
  
  if (containing_region(new_obj)->need_remset_update()) {
    // record this pointer
  }
}

After a young-gc, there are three possibilities regarding the region type: to-survivor, old, regions in cset (eden or from-survivor, due to eval-fail). In the following, we enumerate every from-to pairs of each possibility, and decide whether we need to record such cross-regions pointers and why.

from to action & why
to-survivor to-survivor/old/cset no; pointers residing in young-gen are not recorded, early-return of in_to_survivor(p)
old/cset to-survivor yes; eval-fail regions become Old, and old-gen to young-gen pointers are recorded; all to-survivor regions have need_remset_update == true
old/cset old yes or no; depending to-region's need_remset_update
old/cset cset no; remset for eval-fail regions are rebuilt from scratch (early-return of in_cset(new_obj))

The pseudocode is structured for maximal readability, and it's quite different from the actual code, which contains many optimizations and various low-level details. Hopefully, walking through this pseudocode can help readers form a mental picture of the overall concept, providing a smoother descend to the G1 codebase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment