Let's introduce the region types used in G1:
eden: for newly added objects from-survivor: for objects in to-survivor from previous young-GC to-survivor: live objects in eden/from-survivor are copied here if they have not lived long old: live objects in eden/from-survivor are copied here if they have lived long
After a young-GC, eden/from-survivor regions are reclaimed (if no evac-failure), and to-survivor regions become from-survivor regions. eden/from-survivor/to-survivor regions constitute young-gen and all old regions form the old gen. (There are also Humongous and Archive regions, but we can ignore them for this discussion.)
There are two kinds of cross-region pointers that need to be tracked:
- from old-gen to young-gen: these pointers are part of roots during young-GC
- from old-gen to old-gen when the destination region requires remset update: during mixed-GC (some Old regions are collected in addition to regions in young-gen), those to-be-collected Old regions must know all incoming from other regions to the current region) pointers, and their remset must be kept updated.
At the beginning of a young-GC, the whole young-gen, consisting of only eden and from-survivor regions, is selected as the collection set (cset). During the GC, we try to evacuate all live objects in cset to to-survivor or old regions, depending their age (how long they have lived).
It's a classic copying (aka scavenge) GC algorithm: breadth-first traversal with a stack to hold yet-to-scan objects. Lying in the core of algorithm is the closure (do_oop_evac
), applied to all oop locations pointing into the cset. The part unique to G1 is the react_to_new_pointer
call, which, as its name implies, reacts to the newly established pointer and maintains the aforementioned cross-region invariants. There are two situations where react_to_new_pointer
is called:
- finding pointers pointing into cset: after the destination object has been evacuated from cset to another region at the end of
do_oop_evac
. - finding pointers pointing outside of cset: the destination object will not be moved (because it's outside of cset), so we can call
react_to_new_pointer
when we identify those pointers, during iterating an object (else
branch ofiterate_fields
).
do_oop_evac(oop* p) {
obj = load(p);
if (obj.is_forwarded()) {
new_obj = obj.forwarded_to();
} else {
new_obj = alloc(obj.size);
if (new_obj == null) {
// evac-fail
iterate_fields(obj);
} else {
copy(obj, new_obj);
obj.set_forwarded(new_obj);
iterate_fields(new_obj);
}
}
store(p, new_obj);
react_to_new_pointer(p, new_obj);
}
iterate_fields(obj) {
for (f in obj.fields) {
if (in_cset(load(f))) {
push(f);
} else {
react_to_new_pointer(f, load(f));
}
}
}
void react_to_new_pointer(p, new_obj) {
// decide if need to do sth for this new pointer, p -> new_obj
if (in_same_region(p, new_obj)) {
// not cross-region; ignore
return;
}
if (in_to_survivor(p)) {
// from young-gen to XXX; ignore
return;
}
if (in_cset(new_obj)) {
// evac-fail; the destination region (where new_obj resides) will become Old, and its remset will be rebuilt from scratch.
// Therefore, we don't recording pointers pointing into that region.
return;
}
if (containing_region(new_obj)->need_remset_update()) {
// record this pointer
}
}
After a young-gc, there are three possibilities regarding the region type: to-survivor, old, regions in cset (eden or from-survivor, due to eval-fail). In the following, we enumerate every from-to pairs of each possibility, and decide whether we need to record such cross-regions pointers and why.
from | to | action & why |
---|---|---|
to-survivor | to-survivor/old/cset | no; pointers residing in young-gen are not recorded, early-return of in_to_survivor(p) |
old/cset | to-survivor | yes; eval-fail regions become Old, and old-gen to young-gen pointers are recorded; all to-survivor regions have need_remset_update == true |
old/cset | old | yes or no; depending to-region's need_remset_update |
old/cset | cset | no; remset for eval-fail regions are rebuilt from scratch (early-return of in_cset(new_obj) ) |
The pseudocode is structured for maximal readability, and it's quite different from the actual code, which contains many optimizations and various low-level details. Hopefully, walking through this pseudocode can help readers form a mental picture of the overall concept, providing a smoother descend to the G1 codebase.