Skip to content

Instantly share code, notes, and snippets.

@palaashatri
Last active April 9, 2024 18:53
Show Gist options
  • Save palaashatri/499f5eb41f08841b9360c51c57e23db1 to your computer and use it in GitHub Desktop.
Save palaashatri/499f5eb41f08841b9360c51c57e23db1 to your computer and use it in GitHub Desktop.
A theoretical overview of current state of GCs

Garbage Collectors

from "Really Understanding Garbage Collection - Gil Tene"

Introduction on GC in JVM, and how its different from memory operations in C++

  • GC is extremely efficient, more efficient than malloc(), costs less CPU if done right.
  • Dead objects cost nothing to collect. So do not set dead objects (during finalize() stage) to null, costs a lot more resources.
  • GC will find all dead objects (including cyclic graphs) and clean them.

Collector's operation

Types of collectors

  1. Concurrent Collector: performs GC work concurrently with the application's own execution (runs during application runtime)
  2. Parallel Collector: Multithreaded collector, uses multiple threads to for GC.
  3. Stop-the-World (STW) Collector: Runs GC when application is stopped (or forces application to pause)
  4. Monolithic Collector: Run GC in one, simple execution step.
  5. Incremental Collector: performs garbage collection in small phases with (potentially long) gaps in between.

Precise vs Conservative Collection

  • A collector is conservative if it is unaware of some object references at collection time, or is unsure about whether a filed is a reference or not.
  • A collector is precise if it can fully identify & process all object references at the time of collection.
  • A collector needs to be precise in order to move objects, because a compiler produces a lot of information and we should collect it for optimization of application.
  • All commercial JVMs are precise collectors, and all use some kind of a moving collector.

Safepoint

  • A GC safepoint is a point or range in a thread's execution where the collector can identify all the references in that collector's execution.
  • Bringing a thread to a safepoint is the act of getting a thread to a safepoint and not execute past it. The code can still be running, but cannot progress past the safepoint.
  • Global safepoint: bring all threads to a safepoint

Common behaviours in all precise GCs

  1. Identify live objects in Heap.
  2. Reclaim resources held by dead objects.
  3. Periodically relocate live objects.

Precise GCs can be classified into two types based on how they process GC.

  1. Mark/Sweep/Compact Collector: mostly older gen GCs
  2. Copying Collector: mostly newer gen GCs

Mark (aka Trace)

Finding all the "live" stuff in a Heap.

  1. Start from the "roots" (thread stack, statics etc.)
  2. "Paint" anything you can reach as "live"
  3. At the end of the mark pass:
    • all reachable objects will be marked "live"
    • all non-reachable objects will be marked "dead"

Work is generally "linear" to the "live" set : visits every object and visits every reference.

Sweep

  • Scan through the heap & identify "dead" objects & track them somehow - usually in some form of "free list"

Work is generally linear to heap size (visit every object)

Compact

  • Over time, heap will get "swiss cheesed": contiguous dead b/w objects may not be large enough to fit new objects (a type of fragmentation).
  • Compaction moves live objects together to reclaim contiguous empty space (aka "relocate")
  • Compaction has to correct all object references to point to new object locations (aka "remap")
  • Remap scan must cover all references that could possibly point to relocated objects.

Work is generally "linear" to "live" set.

Copy Collector

  • A copy collector moves all live objects from a "from" space to a "to" space & reclaims "from" space in a single pass.
  • It works as followed:
    1. At the start of copy, all objects are in "from" space and all references point to "from" space
    2. Start from "root" references, copy any reachable object to "to" space, correcting references as we go.
    3. At the end of copy, all objects are in "to" space, and all references point to "to" space.

Work is generally linear to "live" set.

Mark/Sweep/Compact vs Copy vs Mark/Compact

  1. Copy requires 2x the max live set to be reliable.
  2. Mark/Compact (typically) requires 2x the max live set in order to fully recover garbage in each cycle.
  3. Mark/Sweep/Compact requires 1x (+ some more) memory to the live set.
  4. Copy & Mark/Compact are only linear to the "live" set.
  5. Mark/Sweep/Compact is linear (in sweep) to Heap size.
  6. Mark/Sweep/Compact maybe able to avoid some moving work.
  7. Copy is (typically) monolithic.

Generational Collection

  1. Based on "Weak Generational Hypothesis" -- "most objects die young"
  2. Focus collection efforts on young generation (NewGen):
    • use a moving collector, linear to live set
    • live set in the young generation is a small percentage of space
    • promote objects that live long enough to older generations (OldGen)
  3. Only collect older generations as they fill up. A "Generational Filter" reduces rate of allocation into older generations.
  4. Tends to be (orders of magnitude) more efficient.
    • Great way to keep up with high allocation rate.
    • Practical necessity for keeping up with processor throughput.
  5. Requires a remembered set: a way to track references into the young generation from outside. Remembered Set is also part of "roots" for young generation collection
  6. No need for 2x the live set, you can "spill over" objects to OldGen from NewGen
  7. Usually want to keep surviving objects in NewGen for a while before promoting them to OldGen, so as to clean up some more references in case objects go dead right before promotion.

Typical combinations in production JVMs

  1. Young Generation (NewGen) usually uses a copying collector.
  2. Young Generation is monolithic, STW.
  3. Old Generation (OldGen) uses a Mark/Sweep/Compact
  4. Old Generation maybe STW, or concurrent, or mostly-concurrent, or incremental-STW, or mostly-incremental-STW.

Non monolithic-STW approaches

Some other types also exist, which are not monolithic-STW collectors. They use different techniques for GC.

Concurrent Marking

  • Mark all reachable objects as live, but object graph is mutating under us. This can cause a race problem as described below.
  • Classic Concurrent Marking Race problem: mutator may move reference that has not yet been seen by the marker into an object that has already been visited. If not interpreted or prevented in some way it will corrupt the Heap.
  • Example solution: track mutations using multi-pass marking.
    1. track reference mutations during mark (example, in a card table)
    2. revisit all mutated references and track new mutations
    3. when set is "small enough", do a stop the world catch up (mostly concurrent)

Work grows with mutation rate, may fail to finalize.

Incremental Compaction

  1. Track cross-region remembered sets (which region points to which)
  2. To compact a single region, only need to scale regions that point to it to remap all potential references.
  3. Identify region sets that fit in limited time:
    • each set of regions is a STW increment
    • safe to run applications between (but not within) increments

Work can grow with the square of heap size - the number of regions pointing into a single region is generally linear to the heap size (the number of regions in the heap)

Delaying the inevitable

  • All these approaches are just trying to delay the inevitable. Some form of copying/compaction is inevitable in practice. And compacting anything requires scanning/fixing all references to it.
  • Delay tactics focus on getting "easy empty space" first. This is the focus of vast majority of GC tuning.
  • Most objects die young (Generational Collection),
    • So collect young objects only, as much as possible, hoping for a short STW.
    • But eventually, some old dead objects must be reclaimed.
    • Most old dead space can be reclaimed without using it.
      • For example, CMS tracks dead space in lists, and reuses it in place.
      • But eventually space gets fragmented, and it needs to be moved.
    • Much of the heap is not "popular".
      • a non popular region will only be pointed to from a small percentage of the heap.
      • so, compact non-popular regions in short STW passes
      • But eventually popular objects & regions need to be compacted.

Classifying common collectors (not modern as of writing, legacy)

  • Monolithic STW copying NewGen
  • Monolithic STW mark/sweep/compact OldGen
  • Default GC in Java 8. Replaced in Java 11.
  • Good throughput, but bad response times.
  • Monolithic STW copying NewGen (part New)
  • Mostly concurrent, non-compacting OldGen (what CMS stands for, basically)
    • Mostly concurrent marking
    • Tracks mutations in Card
    • Revisits mutated Cards
    • STW to catch up on mutations
  • Concurrent Sweeping
  • Does not compact: maintains a free list, does not move objects
  • Fallbacks to full collection (monolithic STW) for compaction etc.
  • Monolithic STW copying NewGen
  • Mostly concurrent OldGen marker
    • mostly concurrent marking
      • single STW to catch up on mutations
    • tracks inter-region relationships in remembered sets
  • STW mostly incremental compactions OldGen
    • Objective: avoid having a full collection as much as possible
    • Compact sets of regions that can be scanned in limited time
    • delay compaction of popular objects & regions
  • Fallback to Full Collection (monolithic STW) used for compacting popular objects and regions

Modern Collectors

Use concurrent compaction (finally!)

  1. Azul C4 GC: Production ready collector in Zing JVM (ZVM) since Java 7.
  2. Shenandoah: Experimental collector in OpenJDK 12
  3. ZGC: Experimental collector in OpenJDK 11, production ready in OpenJDK 15

Azul C4 GC

Objectives for solving STW

  1. Robust concurrent marking:
    • in the presence of high mutation allocation rates
    • cover modern runtime semantics (example, weak refs, lock deflation etc.)
  2. Compaction that is not monolithic STW
    • stay responsive even when compacting terabyte Heaps
    • must be robust, and not just delay STW compaction. Current incremental STW attempts fall short on robustness
  3. YoungGen that is not monolithic STW
    • stay responsive when promoting multi-GBs data-spikes
    • concurrent or incremental STW may both be OK.

Features

Its a continuously concurrent compacting collector.

  1. Concurrent guaranteed single-pass marker
    • oblivious to mutation rate
    • concurrent reference (weak, soft, final) processing.
  2. Concurrent compactor
    • Objects moved without stopping mutator
    • References remapped without stopping mutator
    • Can relocate entire generation (NewGen/OldGen) in every GC cycle
  3. Concurrent compacting NewGen
  4. Concurrent compacting OldGen
  5. No STW fallback (the option exists, but is not the default behaviour)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment