Skip to content

Instantly share code, notes, and snippets.

@TerrorJack
Last active January 16, 2020 14:09
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save TerrorJack/12fa4cbb3a460925c5e031dacfe803e9 to your computer and use it in GitHub Desktop.
Save TerrorJack/12fa4cbb3a460925c5e031dacfe803e9 to your computer and use it in GitHub Desktop.
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="generator" content="pandoc">
<title>A tour into GHC’s garbage collection</title>
<meta name="apple-mobile-web-app-capable" content="yes">
<meta name="apple-mobile-web-app-status-bar-style" content="black-translucent">
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no, minimal-ui">
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/reveal.js/3.8.0/css/reveal.css">
<style>
code{white-space: pre-wrap;}
span.smallcaps{font-variant: small-caps;}
span.underline{text-decoration: underline;}
div.column{display: inline-block; vertical-align: top; width: 50%;}
</style>
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/reveal.js/3.8.0/css/theme/black.css" id="theme">
<!-- Printing and PDF exports -->
<script>
var link = document.createElement( 'link' );
link.rel = 'stylesheet';
link.type = 'text/css';
link.href = window.location.search.match( /print-pdf/gi ) ? 'https://cdnjs.cloudflare.com/ajax/libs/reveal.js/3.8.0/css/print/pdf.css' : 'https://cdnjs.cloudflare.com/ajax/libs/reveal.js/3.8.0/css/print/paper.css';
document.getElementsByTagName( 'head' )[0].appendChild( link );
</script>
<!--[if lt IE 9]>
<script src="https://cdnjs.cloudflare.com/ajax/libs/reveal.js/3.8.0/lib/js/html5shiv.js"></script>
<![endif]-->
</head>
<body>
<div class="reveal">
<div class="slides">
<section id="title-slide">
<h1 class="title">A tour into GHC’s garbage collection</h1>
</section>
<section><section id="a-hierarchy-of-allocators" class="title-slide slide level1"><h1>A hierarchy of allocators</h1></section><section id="the-os-specific-allocator" class="slide level2">
<h2><a href="https://gitlab.haskell.org/ghc/ghc/blob/master/rts/posix/OSMem.c">The OS-specific allocator</a></h2>
<div>
<ul>
<li class="fragment">Backed by OS-specific interfaces: <a href="http://man7.org/linux/man-pages/man2/mmap.2.html"><code>mmap</code></a> on POSIX, <a href="https://docs.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualalloc"><code>VirtualAlloc</code></a> on Win32</li>
<li class="fragment"><a href="https://gitlab.haskell.org/ghc/ghc/commit/0d1a8d09f452977aadef7897aa12a8d41c7a4af0">Two step allocator for 64-bit systems</a>
<ul>
<li class="fragment">Reserve a large continuous address space, typically <code>1TB</code></li>
<li class="fragment">Commit/decommit a part within that address space</li>
<li class="fragment">Easy to identify static/dynamic memory from a pointer value</li>
</ul></li>
<li class="fragment"><a href="https://gitlab.haskell.org/ghc/ghc/blob/master/rts/RtsUtils.c#L60"><code>malloc</code></a> is still used for runtime metadata</li>
</ul>
</div>
</section><section id="the-mblock-allocator" class="slide level2">
<h2><a href="https://gitlab.haskell.org/ghc/ghc/blob/master/rts/sm/MBlock.c">The MBlock allocator</a></h2>
<div>
<ul>
<li class="fragment">Allocates raw MBlocks (aka megablocks)
<ul>
<li class="fragment"><code>1MB</code> sized/aligned</li>
<li class="fragment">May allocate mgroups when necessary</li>
</ul></li>
<li class="fragment">Free-list based allocation</li>
</ul>
</div>
</section><section id="the-block-allocator" class="slide level2">
<h2><a href="https://gitlab.haskell.org/ghc/ghc/blob/master/rts/sm/BlockAlloc.c">The block allocator</a></h2>
<div>
<ul>
<li class="fragment">Allocates blocks/block groups
<ul>
<li class="fragment"><code>4KB</code> sized/aligned</li>
</ul></li>
<li class="fragment">Associated metadata: <a href="https://gitlab.haskell.org/ghc/ghc/blob/30be3bf13a6e72247ff561df1f291370dad79ef9/includes/rts/storage/Block.h#L87">block descriptors</a>
<ul>
<li class="fragment">Pointers to starting address &amp; free space address</li>
<li class="fragment">Link to other block descriptors</li>
<li class="fragment">Other metadata: block flags, generation, etc</li>
</ul></li>
<li class="fragment">Free-list based allocation</li>
<li class="fragment">Groups closures(heap objects) with similar properties</li>
</ul>
</div>
</section><section id="the-heap-allocator" class="slide level2">
<h2><a href="https://gitlab.haskell.org/ghc/ghc/blob/master/rts/sm/Storage.c">The heap allocator</a></h2>
<div>
<ul>
<li class="fragment">Allocates individual closures</li>
<li class="fragment">Different pools (block groups):
<ul>
<li class="fragment">The “nursery”: where most Haskell closures are born</li>
<li class="fragment">The object pools: used by <code>allocate*</code> functions, for e.g. <code>ByteArray#</code> related stuff</li>
</ul></li>
</ul>
</div>
</section></section>
<section><section id="copying-gc" class="title-slide slide level1"><h1>Copying GC</h1></section><section id="bump-allocation-i" class="slide level2">
<h2>Bump allocation: I</h2>
<div>
<ul>
<li class="fragment">Bump allocation:
<ul>
<li class="fragment">Allocation is simply checking <code>HpLim</code> and bumping <code>Hp</code></li>
<li class="fragment"><code>Hp</code> and <code>HpLim</code> syncs with block descriptors pre/post execution</li>
</ul></li>
<li class="fragment">Upon heap check failures:
<ul>
<li class="fragment">Store the attempted allocation count in <code>HpAlloc</code></li>
<li class="fragment">If <code>CurrentNursery</code> links to an available block, move the nursery and carry on execution</li>
<li class="fragment">Otherwise, report <code>HeapOverflow</code> and return</li>
</ul></li>
</ul>
</div>
</section><section id="bump-allocation-ii" class="slide level2">
<h2>Bump allocation: II</h2>
<div>
<ul>
<li class="fragment">Bump allocation is fast
<ul>
<li class="fragment">Avoids the overhead of free lists</li>
<li class="fragment">Avoids the overhead of C calls</li>
<li class="fragment">Works well with multiple runtime threads, each has its own nursery</li>
</ul></li>
<li class="fragment">Upon <code>HeapOverflow</code>s
<ul>
<li class="fragment">The scheduler updates the nursery and re-enqueues the thread</li>
</ul></li>
<li class="fragment">Simplest GC is no-op GC</li>
</ul>
</div>
</section><section id="copying-gc-i" class="slide level2">
<h2>Copying GC: I</h2>
<div>
<ul>
<li class="fragment">Divides heap space into “fromspace” and “tospace”</li>
<li class="fragment">GC copies fromspace live objects to tospace, then flips the semispaces</li>
<li class="fragment">How to handle pointers between objects?
<ul>
<li class="fragment">Recursive copying: when an object is copied, also copy all directly reachable objects and rewrite pointers</li>
<li class="fragment">Cyclic references: detect already copied objects and skip copying logic</li>
<li class="fragment">Stack overflow: :/</li>
</ul></li>
</ul>
</div>
</section><section id="copying-gc-ii" class="slide level2">
<h2>Copying GC: II</h2>
<div>
<ul>
<li class="fragment">Evacuate: copies a fromspace object into the tospace</li>
<li class="fragment">Scavenge: rewrites pointers in a tospace object by evacuating the pointed objects</li>
<li class="fragment">Tri-color GC: white(original), grey(evacuated), black(evacuated &amp; scavenged)</li>
<li class="fragment">Start of copying gc: evacuate the “root” objects</li>
<li class="fragment">End of copying gc: no more objects are scavenged</li>
<li class="fragment">Scavenge order: FILO(stack, depth-first), FIFO(queue, breadth-first), etc</li>
</ul>
</div>
</section><section id="cheney-scanning-algorithm" class="slide level2">
<h2>Cheney scanning algorithm</h2>
<div>
<ul>
<li class="fragment">Uses the tospace itself as a FIFO queue, no separate work list</li>
<li class="fragment">A pair of tospace pointers: <code>scan</code> points to the first grey object, <code>end</code> points to the end of used tospace</li>
<li class="fragment">Scavenge loop:
<ul>
<li class="fragment">Pick the grey object pointed by <code>scan</code>, evacuate all pointed objects (which advances <code>end</code>)</li>
<li class="fragment">The object is now black, advance <code>scan</code></li>
<li class="fragment">When <code>scan</code> catches up with <code>end</code>: GC finishes</li>
</ul></li>
</ul>
</div>
</section><section id="copying-gc-as-in-ghc-rts" class="slide level2">
<h2>Copying GC as in GHC RTS</h2>
<div>
<ul>
<li class="fragment">Stop the world: all mutator threads must be suspended</li>
<li class="fragment">Different block groups for fromspace/tospace</li>
<li class="fragment">Evacuate: overwrites the closure in place with a tagged “forwarding pointer”</li>
<li class="fragment">Scavenge: scavenges in units of blocks, not individual closures</li>
</ul>
</div>
</section></section>
<section><section id="generational-gc" class="title-slide slide level1"><h1>Generational GC</h1></section><section id="motivation" class="slide level2">
<h2>Motivation</h2>
<div>
<ul>
<li class="fragment">Weak generational hypothesis: most objects die young</li>
<li class="fragment">Group the objects into generations
<ul>
<li class="fragment">Minor GC: only traverse the objects in younger generations</li>
<li class="fragment">Promotion: objects which survives several rounds of GC are promoted to older generations</li>
<li class="fragment">Age: rounds of GC which an object has survived</li>
<li class="fragment">Step: objects may go through several “step”s before promotion; <a href="https://gitlab.haskell.org/ghc/ghc/commit/214b3663d5d7598c13643f9221e43d5a7735b47f">removed</a> by current GHC RTS</li>
</ul></li>
</ul>
</div>
</section><section id="finding-roots-for-generational-gc" class="slide level2">
<h2>Finding roots for generational GC</h2>
<div>
<ul>
<li class="fragment">Traditional roots in a non-generational setting
<ul>
<li class="fragment">No actual evacuate/scavenge if it’s in an older generation</li>
</ul></li>
<li class="fragment">Inter-generational pointers
<ul>
<li class="fragment">Older-to-younger pointers: must be added to the root set</li>
</ul></li>
<li class="fragment">Remembered sets
<ul>
<li class="fragment">Each generation records a set of objects which points to younger generations</li>
</ul></li>
</ul>
</div>
</section><section id="performing-generational-gc" class="slide level2">
<h2>Performing Generational GC</h2>
<div>
<ul>
<li class="fragment">Mutation leads to older-to-younger pointers</li>
<li class="fragment">Write barrier: when writing a managed pointer, always check for inter-generational pointers and add to the set when needed</li>
<li class="fragment">The collector collects a specific generation and all younger generations
<ul>
<li class="fragment">Sets in older generations: needs to be scavenged</li>
<li class="fragment">Sets in collected generations: can be dropped</li>
</ul></li>
<li class="fragment">Closures may be moved across generations
<ul>
<li class="fragment">“Eager promotion”: useful for write-once mutable closures like thunks</li>
</ul></li>
</ul>
</div>
</section><section id="combining-different-gc-algorithms" class="slide level2">
<h2>Combining different GC algorithms</h2>
<div>
<ul>
<li class="fragment">We can switch GC algorithms when collecting different generations</li>
<li class="fragment">GHC RTS always uses copying GC for minor GC</li>
<li class="fragment"><a href="https://gitlab.haskell.org/ghc/ghc/blob/master/rts/sm/Sweep.c">Mark/sweep</a> or <a href="https://gitlab.haskell.org/ghc/ghc/blob/master/rts/sm/Compact.c">mark/compact</a> for the oldest generation during major GC</li>
</ul>
</div>
</section></section>
<section><section id="parallel-gc" class="title-slide slide level1"><h1>Parallel GC</h1></section><section id="parallel-gc-in-ghc-rts" class="slide level2">
<h2>Parallel GC in GHC RTS</h2>
<div>
<ul>
<li class="fragment">Parallel GC: multiple runtime threads to perform GC</li>
<li class="fragment">Parallel evacuation: use a CAS instruction to “claim” the closure before actual evacuation</li>
<li class="fragment">Parallel scavenging
<ul>
<li class="fragment">The tospace is a “pending block set” locked by a mutex</li>
<li class="fragment">Each GC thread claims a pending block and scavenges it</li>
<li class="fragment">Scavenging may allocate fresh blocks being added to the set</li>
</ul></li>
</ul>
</div>
</section></section>
<section><section id="ghc-specific-details" class="title-slide slide level1"><h1>GHC-specific details</h1></section><section id="heap-layout" class="slide level2">
<h2><a href="https://gitlab.haskell.org/ghc/ghc/wikis/commentary/rts/storage/heap-objects">Heap layout</a></h2>
<div>
<ul>
<li class="fragment">Runtime reps of different objects
<ul>
<li class="fragment">Lifted: regular Haskell types, can be thunks</li>
<li class="fragment">Unlifted boxed: can’t be thunks</li>
<li class="fragment">Unboxed: e.g. <code>Int#</code>, <code>Addr#</code>, raw bits</li>
</ul></li>
<li class="fragment">Closures(boxed objects) have the same basic layout
<ul>
<li class="fragment">First word: info table pointer</li>
<li class="fragment">Payload: may contain pointers &amp; non-pointers</li>
<li class="fragment">RTS can quickly <a href="https://gitlab.haskell.org/ghc/ghc/blob/30be3bf13a6e72247ff561df1f291370dad79ef9/includes/rts/storage/Block.h#L184">calculate</a> the block descriptor address from a closure address and obtain runtime metadata from there</li>
</ul></li>
</ul>
</div>
</section><section id="info-tables" class="slide level2">
<h2><a href="https://gitlab.haskell.org/ghc/ghc/blob/master/includes/rts/storage/InfoTables.h">Info tables</a></h2>
<div>
<ul>
<li class="fragment">Contains metadata about the closure:
<ul>
<li class="fragment">Closure’s type (thunk, function, etc)</li>
<li class="fragment">Layout of the closure’s payload, which parts are pointers</li>
<li class="fragment">Function arity, constructor tag, etc</li>
</ul></li>
<li class="fragment">Associated with “entry code”, which evaluates a thunk or enters a function</li>
</ul>
</div>
</section><section id="cafs" class="slide level2">
<h2><a href="https://gitlab.haskell.org/ghc/ghc/wikis/commentary/rts/storage/gc/CAFs">CAFs</a></h2>
<div>
<ul>
<li class="fragment">CAFs: thunks without free variables
<ul>
<li class="fragment">Expressions declared at the top level</li>
<li class="fragment"><code>let</code>/<code>where</code> bindings floated out by the optimizer</li>
</ul></li>
<li class="fragment">Different from dynamically allocated closures
<ul>
<li class="fragment">Dynamic closures can only be referenced via closure payload pointers</li>
<li class="fragment">CAFs have symbols which can appear in code</li>
</ul></li>
<li class="fragment">How does GC track closures referenced by code?
<ul>
<li class="fragment">Scanning the code sections: often hard, sometimes impossible</li>
</ul></li>
</ul>
</div>
</section><section id="garbage-collecting-cafs" class="slide level2">
<h2>Garbage collecting CAFs</h2>
<div>
<ul>
<li class="fragment">GHC emits SRTs, and links to them from info tables
<ul>
<li class="fragment">SRTs themselves are static closures, which point to other static closures referenced by a piece of code</li>
<li class="fragment">The GHC RTS reads a closure’s info table, and scavenges SRTs just like other closures</li>
</ul></li>
<li class="fragment">Corner cases: hard-coded CAF symbols in the C world, e.g. runtime, exports
<ul>
<li class="fragment">Allocate a <code>StablePtr</code>, which becomes a GC root</li>
</ul></li>
</ul>
</div>
</section><section id="yield-points" class="slide level2">
<h2><a href="https://gitlab.haskell.org/ghc/ghc/blob/master/rts/StgStartup.cmm#L104">Yield points</a></h2>
<div>
<ul>
<li class="fragment">Yield point: when a Haskell thread may be suspended for GC</li>
<li class="fragment">Many possible sources:
<ul>
<li class="fragment">Heap/stack checks</li>
<li class="fragment">Safe FFI imports</li>
<li class="fragment"><code>yield</code></li>
</ul></li>
<li class="fragment">Use <a href="https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/using-optimisation.html#ghc-flag--fomit-yields"><code>-fno-omit-yields</code></a> to ensure yield points exist even for tight no-allocation loops</li>
</ul>
</div>
</section><section id="selector-thunks" class="slide level2">
<h2><a href="https://gitlab.haskell.org/ghc/ghc/wikis/commentary/rts/storage/heap-objects#selector-thunks">Selector thunks</a></h2>
<div>
<ul>
<li class="fragment">“Selector thunk”: a thunk which selects a single field from a single data constructor, e.g. <code>fst (x,y)</code></li>
<li class="fragment">GHC RTS has special evacuation <a href="https://gitlab.haskell.org/ghc/ghc/blob/30be3bf13a6e72247ff561df1f291370dad79ef9/rts/sm/Evac.c#L1122">logic</a>:
<ul>
<li class="fragment">Peek the selectee, check if it’s evaluated</li>
<li class="fragment">If so, select that field directly (can be recursive!), rewriting <code>fst (x,y)</code> to <code>x</code></li>
<li class="fragment">If not, evacuate &amp; scavenge the selector thunk as usual</li>
</ul></li>
<li class="fragment"><a href="http://homepages.inf.ed.ac.uk/wadler/topics/garbage-collection.html">Proposed</a> by Philip Wadler, which fixes certain kinds of space leaks</li>
</ul>
</div>
</section><section id="weak-pointers" class="slide level2">
<h2><a href="https://gitlab.haskell.org/ghc/ghc/wikis/commentary/rts/storage/gc/weak">Weak pointers</a></h2>
<div>
<ul>
<li class="fragment">Holds a weak reference to a key; key alive implies value alive</li>
<li class="fragment">Supports adding C &amp; Haskell finalizers</li>
<li class="fragment">Key/value/finalizer relationship holds even if weak pointer itself isn’t reachable</li>
<li class="fragment">Implementation details:
<ul>
<li class="fragment">The nursery &amp; generations maintain lists of weak pointers</li>
<li class="fragment">GC collects weak pointers with dead keys, runs finalizers in <a href="https://gitlab.haskell.org/ghc/ghc/blob/master/libraries/base/GHC/Weak.hs#L144">batch</a> in a separate thread</li>
</ul></li>
</ul>
</div>
</section></section>
<section><section id="lessons-for-haskell-developers" class="title-slide slide level1"><h1>Lessons for Haskell developers</h1></section><section id="adjusting-gc-flags" class="slide level2">
<h2>Adjusting GC flags</h2>
<div>
<ul>
<li class="fragment">Various RTS <a href="https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/runtime_control.html#rts-options-to-control-the-garbage-collector">flags</a> affecting GC
<ul>
<li class="fragment"><code>-A</code>, <code>-H</code>: nursery &amp; initial heap size</li>
<li class="fragment"><code>-I</code>: disable “idle gc” or set interval</li>
<li class="fragment"><code>-qg</code>: disable parallel gc or enable for certain generations</li>
<li class="fragment"><code>-qn</code>: parallel gc worker thread count</li>
</ul></li>
<li class="fragment">General rules of thumb
<ul>
<li class="fragment">Workload: interactive vs batch</li>
<li class="fragment">Tradeoff: latency vs throughput</li>
</ul></li>
</ul>
</div>
</section><section id="querying-triggering-gc" class="slide level2">
<h2>Querying &amp; triggering GC</h2>
<div>
<ul>
<li class="fragment">Querying GC statistics
<ul>
<li class="fragment">Enable the <code>-T</code> RTS flag, and use <a href="https://hackage.haskell.org/package/base/docs/GHC-Stats.html"><code>GHC.Stats</code></a></li>
<li class="fragment">Higher-level frameworks like <a href="https://github.com/tibbe/ekg"><code>ekg</code></a></li>
</ul></li>
<li class="fragment">Manually triggering GC
<ul>
<li class="fragment">Use <a href="https://hackage.haskell.org/package/base/docs/System-Mem.html"><code>System.Mem</code></a></li>
</ul></li>
</ul>
</div>
</section><section id="keep-it-or-drop-it" class="slide level2">
<h2>Keep it or drop it</h2>
<div>
<ul>
<li class="fragment">Keeping a closure alive:
<ul>
<li class="fragment">Register it as a GC root and get the handle: stable pointer</li>
<li class="fragment">Ensure it’s alive at a given point: <code>touch#</code></li>
</ul></li>
<li class="fragment">Explicitly dropping a closure
<ul>
<li class="fragment">Only access it via a mutable reference</li>
<li class="fragment">Overwrite the reference and trigger major gc</li>
</ul></li>
</ul>
</div>
</section><section id="revisiting-weak-pointers" class="slide level2">
<h2>Revisiting weak pointers</h2>
<div>
<ul>
<li class="fragment">About Haskell finalizers: if run by the scheduler, exceptions are caught and discarded</li>
<li class="fragment">Pick the right key:
<ul>
<li class="fragment">Use <code>ThreadId#</code> instead of <code>ThreadId</code>, <code>MutVar#</code> instead of <code>IORef</code>, etc. The boxed coatings are fragile because of GHC optimization.</li>
</ul></li>
</ul>
</div>
</section></section>
<section><section id="references" class="title-slide slide level1"><h1>References</h1></section><section id="books-and-papers" class="slide level2">
<h2>Books and papers</h2>
<ul>
<li><a href="https://www.taylorfrancis.com/books/9781315388021">The garbage collection handbook: the art of automatic memory management</a></li>
<li><a href="http://simonmar.github.io/bib/papers/multiproc.pdf">Haskell on a shared-memory multiprocessor</a></li>
<li><a href="https://www.microsoft.com/en-us/research/wp-content/uploads/2008/06/par-gc-ismm08.pdf">Parallel generational-copying garbage collection with a block-structured heap</a></li>
<li><a href="http://www.eecs.northwestern.edu/~clk800/rand-test-study/_rsfmh/rsfmh-2009-10-8-12-02-00.pdf">Runtime support for multicore Haskell</a></li>
<li><a href="https://gitlab.haskell.org/ghc/ghc/blob/master/docs/storage-mgt/sm.tex">The GHC Storage Manager</a></li>
<li><a href="https://www.well-typed.com/blog/aux/files/nonmoving-gc/design.pdf">A Concurrent Garbage Collector for the Glasgow Haskell Compiler</a></li>
</ul>
</section><section id="wikis-blogs-slides-etc" class="slide level2">
<h2>Wikis, blogs, slides, etc</h2>
<ul>
<li><a href="https://gitlab.haskell.org/ghc/ghc/wikis/commentary/rts/storage">The GHC Commentary</a></li>
<li><a href="https://github.com/takenobu-hs/haskell-ghc-illustrated">GHC illustrated for hardware persons</a></li>
<li><a href="https://simonmar.github.io/posts/2018-06-22-New-SRTs.html">Rethinking Static Reference Tables in GHC</a></li>
<li><a href="https://www.well-typed.com/blog/2018/05/ghc-special-gc-objects/">Objects with special collection routines in GHC’s GC</a></li>
<li><a href="https://osa1.net/posts/2018-03-16-gc-optimizations.html">Three runtime optimizations done by GHC’s GC</a></li>
<li><a href="https://simonmar.github.io/posts/2015-07-28-optimising-garbage-collection-overhead-in-sigma.html">Optimising Garbage Collection Overhead in Sigma</a></li>
<li><a href="http://blog.ezyang.com/2014/05/ghc-and-mutable-arrays-a-dirty-little-secret/">GHC and mutable arrays: a DIRTY little secret</a></li>
<li><a href="http://blog.ezyang.com/2014/05/the-cost-of-weak-pointers-and-finalizers-in-ghc/">The cost of weak pointers and finalizers in GHC</a></li>
<li><a href="https://github.com/ezyang/jfp-ghc-rts/blob/master/howto-gc.txt">JFP article on the GHC RTS</a></li>
<li><a href="https://mainisusuallyafunction.blogspot.com/2011/10/thunks-and-lazy-blackholes-introduction.html">Thunks and lazy blackholes: an introduction to GHC at runtime</a></li>
</ul>
</section><section id="source-code" class="slide level2">
<h2>Source code</h2>
<ul>
<li><a href="https://gitlab.haskell.org/ghc/ghc/blob/master/includes/rts/storage/Closures.h">Heap object layout</a></li>
<li><a href="https://gitlab.haskell.org/ghc/ghc/blob/master/rts/sm/MBlock.c">MBlock allocator</a></li>
<li><a href="https://gitlab.haskell.org/ghc/ghc/blob/master/rts/sm/BlockAlloc.c">Block allocator</a></li>
<li><a href="https://gitlab.haskell.org/ghc/ghc/blob/master/rts/sm/Storage.c">Heap allocator</a></li>
<li><a href="https://gitlab.haskell.org/ghc/ghc/blob/master/rts/sm/GC.c">Garbage collector entry</a></li>
<li><a href="https://gitlab.haskell.org/ghc/ghc/blob/master/rts/sm/Evac.c">Evacuation</a></li>
<li><a href="https://gitlab.haskell.org/ghc/ghc/blob/master/rts/sm/Scav.c">Scavenging</a></li>
</ul>
</section><section id="q-a" class="slide level2">
<h2>Q &amp; A</h2>
</section></section>
</div>
</div>
<script src="https://cdnjs.cloudflare.com/ajax/libs/reveal.js/3.8.0/lib/js/head.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/reveal.js/3.8.0/js/reveal.js"></script>
<script>
// Full list of configuration options available at:
// https://github.com/hakimel/reveal.js#configuration
Reveal.initialize({
// Push each slide change to the browser history
history: true,
// Optional reveal.js plugins
dependencies: [
{ src: 'https://cdnjs.cloudflare.com/ajax/libs/reveal.js/3.8.0/lib/js/classList.js', condition: function() { return !document.body.classList; } },
{ src: 'https://cdnjs.cloudflare.com/ajax/libs/reveal.js/3.8.0/plugin/zoom-js/zoom.js', async: true },
{ src: 'https://cdnjs.cloudflare.com/ajax/libs/reveal.js/3.8.0/plugin/notes/notes.js', async: true }
]
});
</script>
</body>
</html>
revealjs-url title
A tour into GHC's garbage collection

A hierarchy of allocators

::: incremental

  • Backed by OS-specific interfaces: mmap on POSIX, VirtualAlloc on Win32
  • Two step allocator for 64-bit systems
    • Reserve a large continuous address space, typically 1TB
    • Commit/decommit a part within that address space
    • Easy to identify static/dynamic memory from a pointer value
  • malloc is still used for runtime metadata

:::

::: incremental

  • Allocates raw MBlocks (aka megablocks)
    • 1MB sized/aligned
    • May allocate mgroups when necessary
  • Free-list based allocation

:::

::: incremental

  • Allocates blocks/block groups
    • 4KB sized/aligned
  • Associated metadata: block descriptors
    • Pointers to starting address & free space address
    • Link to other block descriptors
    • Other metadata: block flags, generation, etc
  • Free-list based allocation
  • Groups closures(heap objects) with similar properties

:::

::: incremental

  • Allocates individual closures
  • Different pools (block groups):
    • The "nursery": where most Haskell closures are born
    • The object pools: used by allocate* functions, for e.g. ByteArray# related stuff

:::

Copying GC

Bump allocation: I

::: incremental

  • Bump allocation:
    • Allocation is simply checking HpLim and bumping Hp
    • Hp and HpLim syncs with block descriptors pre/post execution
  • Upon heap check failures:
    • Store the attempted allocation count in HpAlloc
    • If CurrentNursery links to an available block, move the nursery and carry on execution
    • Otherwise, report HeapOverflow and return

:::

Bump allocation: II

::: incremental

  • Bump allocation is fast
    • Avoids the overhead of free lists
    • Avoids the overhead of C calls
    • Works well with multiple runtime threads, each has its own nursery
  • Upon HeapOverflows
    • The scheduler updates the nursery and re-enqueues the thread
  • Simplest GC is no-op GC

:::

Copying GC: I

::: incremental

  • Divides heap space into "fromspace" and "tospace"
  • GC copies fromspace live objects to tospace, then flips the semispaces
  • How to handle pointers between objects?
    • Recursive copying: when an object is copied, also copy all directly reachable objects and rewrite pointers
    • Cyclic references: detect already copied objects and skip copying logic
    • Stack overflow: :/

:::

Copying GC: II

::: incremental

  • Evacuate: copies a fromspace object into the tospace
  • Scavenge: rewrites pointers in a tospace object by evacuating the pointed objects
  • Tri-color GC: white(original), grey(evacuated), black(evacuated & scavenged)
  • Start of copying gc: evacuate the "root" objects
  • End of copying gc: no more objects are scavenged
  • Scavenge order: FILO(stack, depth-first), FIFO(queue, breadth-first), etc

:::

Cheney scanning algorithm

::: incremental

  • Uses the tospace itself as a FIFO queue, no separate work list
  • A pair of tospace pointers: scan points to the first grey object, end points to the end of used tospace
  • Scavenge loop:
    • Pick the grey object pointed by scan, evacuate all pointed objects (which advances end)
    • The object is now black, advance scan
    • When scan catches up with end: GC finishes

:::

Copying GC as in GHC RTS

::: incremental

  • Stop the world: all mutator threads must be suspended
  • Different block groups for fromspace/tospace
  • Evacuate: overwrites the closure in place with a tagged "forwarding pointer"
  • Scavenge: scavenges in units of blocks, not individual closures

:::

Generational GC

Motivation

::: incremental

  • Weak generational hypothesis: most objects die young
  • Group the objects into generations
    • Minor GC: only traverse the objects in younger generations
    • Promotion: objects which survives several rounds of GC are promoted to older generations
    • Age: rounds of GC which an object has survived
    • Step: objects may go through several "step"s before promotion; removed by current GHC RTS

:::

Finding roots for generational GC

::: incremental

  • Traditional roots in a non-generational setting
    • No actual evacuate/scavenge if it's in an older generation
  • Inter-generational pointers
    • Older-to-younger pointers: must be added to the root set
  • Remembered sets
    • Each generation records a set of objects which points to younger generations

:::

Performing Generational GC

::: incremental

  • Mutation leads to older-to-younger pointers
  • Write barrier: when writing a managed pointer, always check for inter-generational pointers and add to the set when needed
  • The collector collects a specific generation and all younger generations
    • Sets in older generations: needs to be scavenged
    • Sets in collected generations: can be dropped
  • Closures may be moved across generations
    • "Eager promotion": useful for write-once mutable closures like thunks

:::

Combining different GC algorithms

::: incremental

  • We can switch GC algorithms when collecting different generations
  • GHC RTS always uses copying GC for minor GC
  • Mark/sweep or mark/compact for the oldest generation during major GC

:::

Parallel GC

Parallel GC in GHC RTS

::: incremental

  • Parallel GC: multiple runtime threads to perform GC
  • Parallel evacuation: use a CAS instruction to "claim" the closure before actual evacuation
  • Parallel scavenging
    • The tospace is a "pending block set" locked by a mutex
    • Each GC thread claims a pending block and scavenges it
    • Scavenging may allocate fresh blocks being added to the set

:::

GHC-specific details

::: incremental

  • Runtime reps of different objects
    • Lifted: regular Haskell types, can be thunks
    • Unlifted boxed: can't be thunks
    • Unboxed: e.g. Int#, Addr#, raw bits
  • Closures(boxed objects) have the same basic layout
    • First word: info table pointer
    • Payload: may contain pointers & non-pointers
    • RTS can quickly calculate the block descriptor address from a closure address and obtain runtime metadata from there

:::

::: incremental

  • Contains metadata about the closure:
    • Closure's type (thunk, function, etc)
    • Layout of the closure's payload, which parts are pointers
    • Function arity, constructor tag, etc
  • Associated with "entry code", which evaluates a thunk or enters a function

:::

::: incremental

  • CAFs: thunks without free variables
    • Expressions declared at the top level
    • let/where bindings floated out by the optimizer
  • Different from dynamically allocated closures
    • Dynamic closures can only be referenced via closure payload pointers
    • CAFs have symbols which can appear in code
  • How does GC track closures referenced by code?
    • Scanning the code sections: often hard, sometimes impossible

:::

Garbage collecting CAFs

::: incremental

  • GHC emits SRTs, and links to them from info tables
    • SRTs themselves are static closures, which point to other static closures referenced by a piece of code
    • The GHC RTS reads a closure's info table, and scavenges SRTs just like other closures
  • Corner cases: hard-coded CAF symbols in the C world, e.g. runtime, exports
    • Allocate a StablePtr, which becomes a GC root

:::

::: incremental

  • Yield point: when a Haskell thread may be suspended for GC
  • Many possible sources:
    • Heap/stack checks
    • Safe FFI imports
    • yield
  • Use -fno-omit-yields to ensure yield points exist even for tight no-allocation loops

:::

::: incremental

  • "Selector thunk": a thunk which selects a single field from a single data constructor, e.g. fst (x,y)
  • GHC RTS has special evacuation logic:
    • Peek the selectee, check if it's evaluated
    • If so, select that field directly (can be recursive!), rewriting fst (x,y) to x
    • If not, evacuate & scavenge the selector thunk as usual
  • Proposed by Philip Wadler, which fixes certain kinds of space leaks

:::

::: incremental

  • Holds a weak reference to a key; key alive implies value alive
  • Supports adding C & Haskell finalizers
  • Key/value/finalizer relationship holds even if weak pointer itself isn't reachable
  • Implementation details:
    • The nursery & generations maintain lists of weak pointers
    • GC collects weak pointers with dead keys, runs finalizers in batch in a separate thread

:::

Lessons for Haskell developers

Adjusting GC flags

::: incremental

  • Various RTS flags affecting GC
    • -A, -H: nursery & initial heap size
    • -I: disable "idle gc" or set interval
    • -qg: disable parallel gc or enable for certain generations
    • -qn: parallel gc worker thread count
  • General rules of thumb
    • Workload: interactive vs batch
    • Tradeoff: latency vs throughput

:::

Querying & triggering GC

::: incremental

  • Querying GC statistics
    • Enable the -T RTS flag, and use GHC.Stats
    • Higher-level frameworks like ekg
  • Manually triggering GC

:::

Keep it or drop it

::: incremental

  • Keeping a closure alive:
    • Register it as a GC root and get the handle: stable pointer
    • Ensure it's alive at a given point: touch#
  • Explicitly dropping a closure
    • Only access it via a mutable reference
    • Overwrite the reference and trigger major gc

:::

Revisiting weak pointers

::: incremental

  • About Haskell finalizers: if run by the scheduler, exceptions are caught and discarded
  • Pick the right key:
    • Use ThreadId# instead of ThreadId, MutVar# instead of IORef, etc. The boxed coatings are fragile because of GHC optimization.

:::

References

Books and papers

Wikis, blogs, slides, etc

Source code

Q & A

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment