palaashatri/Garbage-Collectors.md

## Garbage-Collectors.md

      
    Raw
  

              Garbage-Collectors.md
            
          
    Garbage Collectors


from "Really Understanding Garbage Collection - Gil Tene"

Introduction on GC in JVM, and how its different from memory operations in C++


GC is extremely efficient, more efficient than malloc(), costs less CPU if done right.
Dead objects cost nothing to collect. So do not set dead objects (during finalize() stage) to null, costs a lot more resources.
GC will find all dead objects (including cyclic graphs) and clean them.

Collector's operation

Types of collectors


Concurrent Collector: performs GC work concurrently with the application's own execution (runs during application runtime)
Parallel Collector: Multithreaded collector, uses multiple threads to for GC.
Stop-the-World (STW) Collector: Runs GC when application is stopped (or forces application to pause)
Monolithic Collector: Run GC in one, simple execution step.
Incremental Collector: performs garbage collection in small phases with (potentially long) gaps in between.

Precise vs Conservative Collection


A collector is conservative if it is unaware of some object references at collection time, or is unsure about whether a filed is a reference or not.
A collector is precise if it can fully identify & process all object references at the time of collection.
A collector needs to be precise in order to move objects, because a compiler produces a lot of information and we should collect it for optimization of application.
All commercial JVMs are precise collectors, and all use some kind of a moving collector.

Safepoint


A GC safepoint is a point or range in a thread's execution where the collector can identify all the references in that collector's execution.
Bringing a thread to a safepoint is the act of getting a thread to a safepoint and not execute past it. The code can still be running, but cannot progress past the safepoint.
Global safepoint: bring all threads to a safepoint

Common behaviours in all precise GCs


Identify live objects in Heap.
Reclaim resources held by dead objects.
Periodically relocate live objects.

Precise GCs can be classified into two types based on how they process GC.

Mark/Sweep/Compact Collector: mostly older gen GCs
Copying Collector: mostly newer gen GCs

Mark (aka Trace)

Finding all the "live" stuff in a Heap.

Start from the "roots" (thread stack, statics etc.)
"Paint" anything you can reach as "live"
At the end of the mark pass:

all reachable objects will be marked "live"
all non-reachable objects will be marked "dead"


Work is generally "linear" to the "live" set : visits every object and visits every reference.

Sweep


Scan through the heap & identify "dead" objects & track them somehow - usually in some form of "free list"


Work is generally linear to heap size (visit every object)

Compact


Over time, heap will get "swiss cheesed": contiguous dead b/w objects may not be large enough to fit new objects (a type of fragmentation).
Compaction moves live objects together to reclaim contiguous empty space (aka "relocate")
Compaction has to correct all object references to point to new object locations (aka "remap")
Remap scan must cover all references that could possibly point to relocated objects.


Work is generally "linear" to "live" set.

Copy Collector


A copy collector moves all live objects from a "from" space to a "to" space & reclaims "from" space in a single pass.
It works as followed:

At the start of copy, all objects are in "from" space and all references point to "from" space
Start from "root" references, copy any reachable object to "to" space, correcting references as we go.
At the end of copy, all objects are in "to" space, and all references point to "to" space.


Work is generally linear to "live" set.

Mark/Sweep/Compact vs Copy vs Mark/Compact


Copy requires 2x the max live set to be reliable.
Mark/Compact (typically) requires 2x the max live set in order to fully recover garbage in each cycle.
Mark/Sweep/Compact requires 1x (+ some more) memory to the live set.
Copy & Mark/Compact are only linear to the "live" set.
Mark/Sweep/Compact is linear (in sweep) to Heap size.
Mark/Sweep/Compact maybe able to avoid some moving work.
Copy is (typically) monolithic.

Generational Collection


Based on "Weak Generational Hypothesis" -- "most objects die young"
Focus collection efforts on young generation (NewGen):

use a moving collector, linear to live set
live set in the young generation is a small percentage of space
promote objects that live long enough to older generations (OldGen)


Only collect older generations as they fill up. A "Generational Filter" reduces rate of allocation into older generations.
Tends to be (orders of magnitude) more efficient.

Great way to keep up with high allocation rate.
Practical necessity for keeping up with processor throughput.


Requires a remembered set: a way to track references into the young generation from outside. Remembered Set is also part of "roots" for young generation collection
No need for 2x the live set, you can "spill over" objects to OldGen from NewGen
Usually want to keep surviving objects in NewGen for a while before promoting them to OldGen, so as to clean up some more references in case objects go dead right before promotion.

Typical combinations in production JVMs


Young Generation (NewGen) usually uses a copying collector.
Young Generation is monolithic, STW.
Old Generation (OldGen) uses a Mark/Sweep/Compact
Old Generation maybe STW, or concurrent, or mostly-concurrent, or incremental-STW, or mostly-incremental-STW.

Non monolithic-STW approaches

Some other types also exist, which are not monolithic-STW collectors. They use different techniques for GC.
Concurrent Marking


Mark all reachable objects as live, but object graph is mutating under us. This can cause a race problem as described below.
Classic Concurrent Marking Race problem: mutator may move reference that has not yet been seen by the marker into an object that has already been visited. If not interpreted or prevented in some way it will corrupt the Heap.
Example solution: track mutations using multi-pass marking.

track reference mutations during mark (example, in a card table)
revisit all mutated references and track new mutations
when set is "small enough", do a stop the world catch up (mostly concurrent)


Work grows with mutation rate, may fail to finalize.

Incremental Compaction


Track cross-region remembered sets (which region points to which)
To compact a single region, only need to scale regions that point to it to remap all potential references.
Identify region sets that fit in limited time:

each set of regions is a STW increment
safe to run applications between (but not within) increments


Work can grow with the square of heap size - the number of regions pointing into a single region is generally linear to the heap size (the number of regions in the heap)

Delaying the inevitable


All these approaches are just trying to delay the inevitable. Some form of copying/compaction is inevitable in practice. And compacting anything requires scanning/fixing all references to it.
Delay tactics focus on getting "easy empty space" first. This is the focus of vast majority of GC tuning.
Most objects die young (Generational Collection),

So collect young objects only, as much as possible, hoping for a short STW.
But eventually, some old dead objects must be reclaimed.
Most old dead space can be reclaimed without using it.

For example, CMS tracks dead space in lists, and reuses it in place.
But eventually space gets fragmented, and it needs to be moved.


Much of the heap is not "popular".

a non popular region will only be pointed to from a small percentage of the heap.
so, compact non-popular regions in short STW passes
But eventually popular objects & regions need to be compacted.


Classifying common collectors (not modern as of writing, legacy)

Hotspot Parallel GC


Monolithic STW copying NewGen
Monolithic STW mark/sweep/compact OldGen
Default GC in Java 8. Replaced in Java 11.
Good throughput, but bad response times.

Hotspot ConcMarkSweepGC (CMS)


Monolithic STW copying NewGen (part New)
Mostly concurrent, non-compacting OldGen (what CMS stands for, basically)

Mostly concurrent marking
Tracks mutations in Card
Revisits mutated Cards
STW to catch up on mutations


Concurrent Sweeping
Does not compact: maintains a free list, does not move objects
Fallbacks to full collection (monolithic STW) for compaction etc.

Hotspot G1 GC (aka Garbage First)


Monolithic STW copying NewGen
Mostly concurrent OldGen marker

mostly concurrent marking

single STW to catch up on mutations


tracks inter-region relationships in remembered sets


STW mostly incremental compactions OldGen

Objective: avoid having a full collection as much as possible
Compact sets of regions that can be scanned in limited time
delay compaction of popular objects & regions


Fallback to Full Collection (monolithic STW) used for compacting popular objects and regions

Modern Collectors

Use concurrent compaction (finally!)

Azul C4 GC: Production ready collector in Zing JVM (ZVM) since Java 7.
Shenandoah: Experimental collector in OpenJDK 12
ZGC: Experimental collector in OpenJDK 11, production ready in OpenJDK 15

Azul C4 GC

Objectives for solving STW


Robust concurrent marking:

in the presence of high mutation allocation rates
cover modern runtime semantics (example, weak refs, lock deflation etc.)


Compaction that is not monolithic STW

stay responsive even when compacting terabyte Heaps
must be robust, and not just delay STW compaction. Current incremental STW attempts fall short on robustness


YoungGen that is not monolithic STW

stay responsive when promoting multi-GBs data-spikes
concurrent or incremental STW may both be OK.


Features

Its a continuously concurrent compacting collector.

Concurrent guaranteed single-pass marker

oblivious to mutation rate
concurrent reference (weak, soft, final) processing.


Concurrent compactor

Objects moved without stopping mutator
References remapped without stopping mutator
Can relocate entire generation (NewGen/OldGen) in every GC cycle


Concurrent compacting NewGen
Concurrent compacting OldGen
No STW fallback (the option exists, but is not the default behaviour)