sam4k/kmem_cache_alignment.md

## kmem_cache_alignment.md

      
    Raw
  

              kmem_cache_alignment.md
            
          
    Kmem Cache Alignment

Understanding alignment, object size and objects per cache for special purposes caches in the SLUB allocator.
Overview

Let's take a look a filp, the special purpose cache for struct file, as an example.
Note: I'm using a 5.4 kernel as that's what I had on hand (newer kernels have like struct slab overlay and stuff)
Additional edit: I've simplified things here, focusing on filp, e.g. additional alignment on size and stuff can happen in calculate_sizes(), typically this is just aligning the size to word boundary (??)
Cache Creation

As a quick refresh, this is the api for creating a special purpose cache:
struct kmem_cache *kmem_cache_create(const char *name, unsigned int size,
			unsigned int align, slab_flags_t flags,
			void (*ctor)(void *));
Now let's take a look at how filp is created in files_init():
void __init files_init(void)
{
	filp_cachep = kmem_cache_create("filp", sizeof(struct file), 0,
			SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_ACCOUNT, NULL);
	percpu_counter_init(&nr_files, 0, GFP_KERNEL);
}
Cool, we're interested in size, align and also flags:

size here for me is 0xe8
align is set to 0, okay, simple
we're also interested in this SLAB_HWCACHE_ALIGN

Besides the align arg, this flag is also used to determine our alignment:


%SLAB_HWCACHE_ALIGN - Align the objects in this cache to a hardware
cacheline.  This can be beneficial if you're counting cycles as closely
as davem.


Specifically, this happens during calculate_alignment() as we setup our struct kmem_cache:
static unsigned int calculate_alignment(slab_flags_t flags,
		unsigned int align, unsigned int size)
{
	/*
	 * If the user wants hardware cache aligned objects then follow that
	 * suggestion if the object is sufficiently large.
	 *
	 * The hardware cache alignment cannot override the specified
	 * alignment though. If that is greater then use it.
	 */
	if (flags & SLAB_HWCACHE_ALIGN) {                                [0]
		unsigned int ralign;

		ralign = cache_line_size();                                    [1]
		while (size <= ralign / 2)
			ralign /= 2;
		align = max(align, ralign);
	}

	align = max(align, arch_slab_minalign());

	return ALIGN(align, sizeof(void *));
}
In our example, args are size = 0xe8 & align = 0. However, as SLAB_HWCACHE_ALIGN was set we hit [0].
We can check cache_line_size() [1] via /sys/devices/system/cpu/cpu0/cache/index2/coherency_line_size. For me its 64.
So in this instance align is actually set to 0x40 (64). Note, if align was set in the arg, it will use the larger of the two.
As size is 0xe8, due to align of 0x40, we should see object_size of 0x100.
Verifying

Proof is in the pudding though, so let's double check this.
We can set a breakpoint in __alloc_file() where allocations are made to the filp cache:
static struct file *__alloc_file(int flags, const struct cred *cred)
{
	struct file *f;
	int error;

	f = kmem_cache_zalloc(filp_cachep, GFP_KERNEL);            <=== break here
If we break on kmem_cache_zalloc() we can print the second arg in gdb and examine struct kmem_cache filp:
gef➤  p *(struct kmem_cache*)0xffff888100222c00
$2 = {
  size = 0x100,
  object_size = 0x100,
  ...
  oo = {
    x = 0x10
  },
  ...
  align = 0x40,
  ...
  name = 0xffffffff82602d1b "filp",
  ...

We can see object_size checks out with our hypothesis, as does align. struct kmem_cache_order_objects oo can tell us the number of objects per slab (it contains slab order in the upper bits and objects per slab in the lower):
objects_per_slab = 0x10 & ((1 << 16) - 1) = 0x10

See kernel funcs oo_order() and oo_objects() in mm/slub.c for ref.
Alt Ways To Verify

We can also go a bit deeper and explore one of the slab's pages to check the objects field, which is a 15-bit counter defining the total number of objects in the slab.
Note: 5.17 plus has the struct slab overlay, otherwise use struct page like I am here:
gef➤  p *(struct page*)0xffffea00045bc700
$7 = {
...
      {
	...
        {
          inuse = 0x8,
          objects = 0x10,                         
          frozen = 0x0
        }
      }

I got this by exploring one of the kmem_cache_nodes lists, and we can see the objects matches what we'd expect from a 0x1000 page containing objects of size 0x100.
Remembering you can also just use slabtop or /proc/slabinfo too lol (for the same results as above):
# name        <objsize> <objperslab> <pagesperslab>
filp          256       16            1