Understanding alignment, object size and objects per cache for special purposes caches in the SLUB allocator.
Let's take a look a filp
, the special purpose cache for struct file
, as an example.
Note: I'm using a 5.4 kernel as that's what I had on hand (newer kernels have like struct slab overlay and stuff)
Additional edit: I've simplified things here, focusing on filp
, e.g. additional alignment on size and stuff can happen in calculate_sizes()
, typically this is just aligning the size
to word boundary (??)
As a quick refresh, this is the api for creating a special purpose cache:
struct kmem_cache *kmem_cache_create(const char *name, unsigned int size,
unsigned int align, slab_flags_t flags,
void (*ctor)(void *));
Now let's take a look at how filp
is created in files_init()
:
void __init files_init(void)
{
filp_cachep = kmem_cache_create("filp", sizeof(struct file), 0,
SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_ACCOUNT, NULL);
percpu_counter_init(&nr_files, 0, GFP_KERNEL);
}
Cool, we're interested in size
, align
and also flags
:
size
here for me is0xe8
align
is set to 0, okay, simple- we're also interested in this
SLAB_HWCACHE_ALIGN
Besides the align
arg, this flag is also used to determine our alignment:
- %SLAB_HWCACHE_ALIGN - Align the objects in this cache to a hardware
- cacheline. This can be beneficial if you're counting cycles as closely
- as davem.
Specifically, this happens during calculate_alignment()
as we setup our struct kmem_cache
:
static unsigned int calculate_alignment(slab_flags_t flags,
unsigned int align, unsigned int size)
{
/*
* If the user wants hardware cache aligned objects then follow that
* suggestion if the object is sufficiently large.
*
* The hardware cache alignment cannot override the specified
* alignment though. If that is greater then use it.
*/
if (flags & SLAB_HWCACHE_ALIGN) { [0]
unsigned int ralign;
ralign = cache_line_size(); [1]
while (size <= ralign / 2)
ralign /= 2;
align = max(align, ralign);
}
align = max(align, arch_slab_minalign());
return ALIGN(align, sizeof(void *));
}
In our example, args are size = 0xe8
& align = 0
. However, as SLAB_HWCACHE_ALIGN
was set we hit [0].
We can check cache_line_size()
[1] via /sys/devices/system/cpu/cpu0/cache/index2/coherency_line_size
. For me its 64.
So in this instance align
is actually set to 0x40 (64). Note, if align
was set in the arg, it will use the larger of the two.
As size
is 0xe8
, due to align
of 0x40, we should see object_size
of 0x100.
Proof is in the pudding though, so let's double check this.
We can set a breakpoint in __alloc_file()
where allocations are made to the filp
cache:
static struct file *__alloc_file(int flags, const struct cred *cred)
{
struct file *f;
int error;
f = kmem_cache_zalloc(filp_cachep, GFP_KERNEL); <=== break here
If we break on kmem_cache_zalloc()
we can print the second arg in gdb and examine struct kmem_cache filp
:
gef➤ p *(struct kmem_cache*)0xffff888100222c00
$2 = {
size = 0x100,
object_size = 0x100,
...
oo = {
x = 0x10
},
...
align = 0x40,
...
name = 0xffffffff82602d1b "filp",
...
We can see object_size
checks out with our hypothesis, as does align
. struct kmem_cache_order_objects oo
can tell us the number of objects per slab (it contains slab order in the upper bits and objects per slab in the lower):
objects_per_slab = 0x10 & ((1 << 16) - 1) = 0x10
See kernel funcs oo_order()
and oo_objects()
in mm/slub.c
for ref.
We can also go a bit deeper and explore one of the slab's pages to check the objects
field, which is a 15-bit counter defining the total number of objects in the slab.
Note: 5.17 plus has the struct slab
overlay, otherwise use struct page
like I am here:
gef➤ p *(struct page*)0xffffea00045bc700
$7 = {
...
{
...
{
inuse = 0x8,
objects = 0x10,
frozen = 0x0
}
}
I got this by exploring one of the kmem_cache_node
s lists, and we can see the objects
matches what we'd expect from a 0x1000 page containing objects of size 0x100.
Remembering you can also just use slabtop
or /proc/slabinfo
too lol (for the same results as above):
# name <objsize> <objperslab> <pagesperslab>
filp 256 16 1