- GC is extremely efficient, more efficient than
malloc()
, costs less CPU if done right. - Dead objects cost nothing to collect. So do not set dead objects (during
finalize()
stage) tonull
, costs a lot more resources. - GC will find all dead objects (including cyclic graphs) and clean them.
- Concurrent Collector: performs GC work concurrently with the application's own execution (runs during application runtime)
- Parallel Collector: Multithreaded collector, uses multiple threads to for GC.
- Stop-the-World (STW) Collector: Runs GC when application is stopped (or forces application to pause)
- Monolithic Collector: Run GC in one, simple execution step.
- Incremental Collector: performs garbage collection in small phases with (potentially long) gaps in between.
- A collector is conservative if it is unaware of some object references at collection time, or is unsure about whether a filed is a reference or not.
- A collector is precise if it can fully identify & process all object references at the time of collection.
- A collector needs to be precise in order to move objects, because a compiler produces a lot of information and we should collect it for optimization of application.
- All commercial JVMs are precise collectors, and all use some kind of a moving collector.
- A GC safepoint is a point or range in a thread's execution where the collector can identify all the references in that collector's execution.
- Bringing a thread to a safepoint is the act of getting a thread to a safepoint and not execute past it. The code can still be running, but cannot progress past the safepoint.
- Global safepoint: bring all threads to a safepoint
- Identify live objects in Heap.
- Reclaim resources held by dead objects.
- Periodically relocate live objects.
Precise GCs can be classified into two types based on how they process GC.
Mark/Sweep/Compact Collector
: mostly older gen GCsCopying Collector
: mostly newer gen GCs
Finding all the "live" stuff in a Heap.
- Start from the "roots" (thread stack, statics etc.)
- "Paint" anything you can reach as "live"
- At the end of the mark pass:
- all reachable objects will be marked "live"
- all non-reachable objects will be marked "dead"
Work is generally "linear" to the "live" set : visits every object and visits every reference.
- Scan through the heap & identify "dead" objects & track them somehow - usually in some form of "free list"
Work is generally linear to heap size (visit every object)
- Over time, heap will get "swiss cheesed": contiguous dead b/w objects may not be large enough to fit new objects (a type of fragmentation).
- Compaction moves live objects together to reclaim contiguous empty space (aka "relocate")
- Compaction has to correct all object references to point to new object locations (aka "remap")
- Remap scan must cover all references that could possibly point to relocated objects.
Work is generally "linear" to "live" set.
- A copy collector moves all live objects from a "from" space to a "to" space & reclaims "from" space in a single pass.
- It works as followed:
- At the start of copy, all objects are in "from" space and all references point to "from" space
- Start from "root" references, copy any reachable object to "to" space, correcting references as we go.
- At the end of copy, all objects are in "to" space, and all references point to "to" space.
Work is generally linear to "live" set.
Copy
requires 2x the max live set to be reliable.Mark/Compact
(typically) requires 2x the max live set in order to fully recover garbage in each cycle.Mark/Sweep/Compact
requires 1x (+ some more) memory to the live set.Copy
&Mark/Compact
are only linear to the "live" set.Mark/Sweep/Compact
is linear (in sweep) to Heap size.Mark/Sweep/Compact
maybe able to avoid some moving work.Copy
is (typically) monolithic.
- Based on "Weak Generational Hypothesis" -- "most objects die young"
- Focus collection efforts on young generation (
NewGen
):- use a moving collector, linear to live set
- live set in the young generation is a small percentage of space
- promote objects that live long enough to older generations (
OldGen
)
- Only collect older generations as they fill up. A "Generational Filter" reduces rate of allocation into older generations.
- Tends to be (orders of magnitude) more efficient.
- Great way to keep up with high allocation rate.
- Practical necessity for keeping up with processor throughput.
- Requires a
remembered set
: a way to track references into the young generation from outside. Remembered Set is also part of "roots" for young generation collection - No need for 2x the live set, you can "spill over" objects to OldGen from NewGen
- Usually want to keep surviving objects in NewGen for a while before promoting them to OldGen, so as to clean up some more references in case objects go dead right before promotion.
- Young Generation (
NewGen
) usually uses a copying collector. - Young Generation is monolithic, STW.
- Old Generation (
OldGen
) uses a Mark/Sweep/Compact - Old Generation maybe
STW
, orconcurrent
, ormostly-concurrent
, orincremental-STW
, ormostly-incremental-STW
.
Some other types also exist, which are not monolithic-STW collectors. They use different techniques for GC.
- Mark all reachable objects as live, but object graph is mutating under us. This can cause a race problem as described below.
- Classic Concurrent Marking Race problem: mutator may move reference that has not yet been seen by the marker into an object that has already been visited. If not interpreted or prevented in some way it will corrupt the Heap.
- Example solution: track mutations using multi-pass marking.
- track reference mutations during mark (example, in a card table)
- revisit all mutated references and track new mutations
- when set is "small enough", do a stop the world catch up (mostly concurrent)
Work grows with mutation rate, may fail to finalize.
- Track cross-region remembered sets (which region points to which)
- To compact a single region, only need to scale regions that point to it to remap all potential references.
- Identify region sets that fit in limited time:
- each set of regions is a STW increment
- safe to run applications between (but not within) increments
Work can grow with the square of heap size - the number of regions pointing into a single region is generally linear to the heap size (the number of regions in the heap)
- All these approaches are just trying to delay the inevitable. Some form of copying/compaction is inevitable in practice. And compacting anything requires scanning/fixing all references to it.
- Delay tactics focus on getting "easy empty space" first. This is the focus of vast majority of GC tuning.
- Most objects die young (Generational Collection),
- So collect young objects only, as much as possible, hoping for a short STW.
- But eventually, some old dead objects must be reclaimed.
- Most old dead space can be reclaimed without using it.
- For example, CMS tracks dead space in lists, and reuses it in place.
- But eventually space gets fragmented, and it needs to be moved.
- Much of the heap is not "popular".
- a non popular region will only be pointed to from a small percentage of the heap.
- so, compact non-popular regions in short STW passes
- But eventually popular objects & regions need to be compacted.
- Monolithic STW copying NewGen
- Monolithic STW mark/sweep/compact OldGen
- Default GC in Java 8. Replaced in Java 11.
- Good throughput, but bad response times.
- Monolithic STW copying NewGen (part New)
- Mostly concurrent, non-compacting OldGen (what CMS stands for, basically)
- Mostly concurrent marking
- Tracks mutations in Card
- Revisits mutated Cards
- STW to catch up on mutations
- Concurrent Sweeping
- Does not compact: maintains a free list, does not move objects
- Fallbacks to full collection (monolithic STW) for compaction etc.
- Monolithic STW copying NewGen
- Mostly concurrent OldGen marker
- mostly concurrent marking
- single STW to catch up on mutations
- tracks inter-region relationships in remembered sets
- mostly concurrent marking
- STW mostly incremental compactions OldGen
- Objective: avoid having a full collection as much as possible
- Compact sets of regions that can be scanned in limited time
- delay compaction of popular objects & regions
- Fallback to Full Collection (monolithic STW) used for compacting popular objects and regions
Use concurrent compaction (finally!)
- Azul C4 GC: Production ready collector in Zing JVM (ZVM) since Java 7.
- Shenandoah: Experimental collector in OpenJDK 12
- ZGC: Experimental collector in OpenJDK 11, production ready in OpenJDK 15
- Robust concurrent marking:
- in the presence of high mutation allocation rates
- cover modern runtime semantics (example, weak refs, lock deflation etc.)
- Compaction that is not monolithic STW
- stay responsive even when compacting terabyte Heaps
- must be robust, and not just delay STW compaction. Current incremental STW attempts fall short on robustness
- YoungGen that is not monolithic STW
- stay responsive when promoting multi-GBs data-spikes
- concurrent or incremental STW may both be OK.
Its a continuously concurrent compacting collector.
- Concurrent guaranteed single-pass marker
- oblivious to mutation rate
- concurrent reference (weak, soft, final) processing.
- Concurrent compactor
- Objects moved without stopping mutator
- References remapped without stopping mutator
- Can relocate entire generation (NewGen/OldGen) in every GC cycle
- Concurrent compacting NewGen
- Concurrent compacting OldGen
- No STW fallback (the option exists, but is not the default behaviour)