Skip to content

Instantly share code, notes, and snippets.

@FeepingCreature
Last active January 11, 2024 13:29
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save FeepingCreature/a47a3daed89d905668da08effaa4d6cd to your computer and use it in GitHub Desktop.
Save FeepingCreature/a47a3daed89d905668da08effaa4d6cd to your computer and use it in GitHub Desktop.
The effect of heapSizeFactor on CPU and memory usage

The D GC

The D GC is tuned by default to trade memory for performance. This can be clearly seen in the default heap size target of 2.0, ie. the GC will prefer to just allocate more memory until less than half the heap memory is alive. But with long-running user-triggered processes, memory can be more at a premium than CPU is, and larger heaps also mean slower collection runs. Can we tweak GC parameters to make D programs use less memory? More importantly, what is the effect of doing so?

Adjustable parameters

There are two important parameters: heapSizeFactor and maxPoolSize.

  • heapSizeFactor defines the target "used heap to live memory" ratio. It defaults to 2.0.
  • maxPoolSize defines the maximum pool size, which is the unit by which D allocates (and releases) memory from the operating system. So residential memory usage will generally grow in units of maxPoolSize.

You can manually vary these parameters by passing --DRT-gcopt="heapSizeFactor:1.1 maxPoolSize:8" to any D program.

As a reference program, I'll use my heap fragmentation/GC leak testcase from "Why does this simple program leak 500MB of RAM?".

Observations

heapsizefactor1.png

So here's a diagram of RSS memory usage and program runtime as I adjust heapSizeFactor (on the X axis). We can clearly see two things:

  • the D GC is extremely random in actual heap usage (as expected for a system without per-thread pools) but becomes less so as collections get more frequent
  • you can get a significant improvement in memory usage for very little cost
  • something wild happens between heapSizeFactor=1.0 and heapSizeFactor=1.1.

Clearly, using a linear scale was a mistake. Let's try a different progression defined by 1 + 1 / (1.1 ^ x)):

heapsizefactor2.png

I've added four runs with different maxPoolSize settings. Several additional things become clear:

  • the exponential scale was the right way to go
  • GC CPU usage goes up slower than memory goes down, indicating significant potential benefit.

Interestingly, adjustments between 2 and 1.1 seem to have very little effect. Pretty much the only thing that matters is the number of zeroes and maybe the final digit. For instance, if you're willing to accept a doubling of GC cost for a halving of RAM, you should tune your heapSizeFactor to 1.002.

Annoyingly, there seems to be no benefit from maxPoolSize. The reduction in memory that you attain by smaller pools is pretty much exactly made up by the increased CPU use, so that you could gain the same reduction by just running the GC more often via heapSizeFactor. Still, good to know.

Note that this benchmark was performed with an extremely GC hungry program. Performance impact and benefit may vary with type of process. Nonetheless, I'll be attempting and advocating to run all but the most CPU-hungry of our services with --DRT-gcopt=heapSizeFactor:1.002.

Speculation

Why do more aggressive GC runs reduce total memory used? I can't help but think it's down to heap fragmentation. D's GC is non-moving, meaning once a pointer is allocated, it has to stay there until it is freed. As a result, for programs that mix long-lived and short-lived allocations, such as "anything that parses with std.json" and "anything that uses threads at all", a pool that was only needed at peak memory usage may be kept alive by a small number of surviving allocations. In that case, more frequent GC runs will allow the program to pack more actually-alive content into the pools already allocated, reducing the peak use and thus fragmentation. In the long run it averages out, but in the long run I restart the service cause it uses too much memory anyways.

At any rate, without fundamental changes to the language, such as finding ways to make at least some allocations movable, there isn't anything to be done. For now, the default setting for heapSizeFactor of 2 may be good for benchmarks, but for long-running server processes, I suspect it makes the GC look worse than it is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment