Skip to content

Instantly share code, notes, and snippets.

@aparnachaudhary
Last active February 2, 2019 10:00
Show Gist options
  • Save aparnachaudhary/95dc72110aa7d6d98ea5d558d52d1d88 to your computer and use it in GitHub Desktop.
Save aparnachaudhary/95dc72110aa7d6d98ea5d558d52d1d88 to your computer and use it in GitHub Desktop.

Introduction

The glibc memory allocator, at least in some versions of Linux, has an optimization to improve speed by avoiding contention when a process has a large number of concurrent threads. The supposed speedup is achieved by maintaining per-core memory pools. Essentially, with this optimization, the OS grabs memory for a given process in pretty big same-size (64MB) chunks called arenas, which are clearly visible when process memory is analyzed with pmap. Each arena is available only to its respective CPU core, so no more than one thread at a time can operate on it. Then, individual malloc() calls reserve memory within these arenas. Up to a certain maximum number of arenas (8 by default) can be allocated per each CPU core. Looks like this is maxed out when the number of threads is high and/or threads are created and destroyed frequently. The actual amount of memory utilized by the application within these arenas can be quite small. However, if an application has a large number of threads, and the machine has a large number of CPU cores, the total amount of memory reserved in this way can grow really high. For example, on a machine with 16 cores, this number would be 16 * 8 * 64MB = 8GB.

References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment