How Linux approximates memory metrics
threads-memory.c (included below) starts 100 threads, allocates 1
MB of memory in each, and then pauses. How much memory is it using?
Let's find out by running it:
$ gcc -pthread threads-memory.c -o threads-memory $ ./threads-memory starting threads... done press enter to exit
While it's still running, check the RSS (resident set size) using
ps. On my
Linux system, the result is:
$ ps -o rss -C threads-memory RSS 81084
It's only using 81 MB! How could it possibly be using less than 100 MB?
Starting in Linux 2.6.34, the value reported by
ps is an approximation:
For making accounting scalable, RSS related information are handled in asynchronous manner and the vaule [sic] may not be very precise. To see a precise snapshot of a moment, you can see
/proc/<pid>/smapsfile and scan page table. It's slow but very precise.
Let's try to understand this change.
In Linux, threads are just processes that happen to share the same address
space (memory). The struct
task_struct represents a process, and the struct
mm_struct represents an address space.
mm_struct contains a counter
tracking the RSS. This is the value used by
Having every thread access the same
mm_struct every time memory is allocated
would be inefficient. The optimization adds a per-thread cache for the counter
task_struct. Each cache is flushed to the associated
every 64 page faults in a thread. Assuming a 4 KB page size, this means that up
to 252 KB (
64 * 4 KB) may be unaccounted for. Probably not a big deal, unless
you're running a lot of threads!
To get a precise RSS value, you can use the
pmap command instead, which scans
the page table instead of using the RSS counter:
$ pmap -x $(pidof threads-memory) | grep -E "Address|total" Address Kbytes RSS Dirty Mode Mapping total kB 2960712 105424 103888