Skip to content

Instantly share code, notes, and snippets.

@sandip4n
Created August 22, 2019 17:22
Show Gist options
  • Save sandip4n/5bc010e125d667d453fd74928c8f28e0 to your computer and use it in GitHub Desktop.
Save sandip4n/5bc010e125d667d453fd74928c8f28e0 to your computer and use it in GitHub Desktop.
struct resv_map
--> for MAP_SHARED, resv_map is preallocated at some point and kept
in the inode mapping metadata
--> inode_resv_map()
--> resv_map = inode->i_mapping->private_data
--> for MAP_ANONYMOUS, vma->vm_private_data points to the map
sys_mmap() with MAP_PRIVATE, MAP_ANONYMOUS, MAP_HUGETLB in flags
--> ksys_mmap_pgoff()
--> hugetlb_file_setup()
--> name = "anon_hugepage", acctflag = VM_NORESERVE
--> creat_flags = HUGETLB_ANONHUGE_INODE
--> file is being created on the internal vfs mount and shmfs
accounting rules do not apply ??
--> hugetlbfs_get_inode() ??
--> get inode from hugetlbfs_vfsmount[hstate_idx]->mnt_sb --> superblock ??
--> hugetlb_reserve_pages()
--> from = 0, to = no. of hugepages we want
--> vma = NULL, vm_flags = VM_NORESERVE
--> change introduced by commit e68375c850b0
--> why is this even required ??
--> only validates if (from > to)
--> nothing else is changed or validated
--> just returns after finding VM_NORESERVE
--> alloc_file_pseudo()
--> creates the pseudo file called "anon_hugepage"
--> associates this file with hugetlbfs file operations
--> assigns file->f_op = hugetlbfs_file_operations
--> fallocate() via hugetlbfs_fallocate()
--> mmap() via hugetlbfs_file_mmap()
--> get_unmapped_area() via hugetlb_get_unmapped_area(), and so on.
--> vm_mmap_pgoff()
--> do_mmap()
--> do_mmap_pgoff()
--> do_mmap()
--> get_unmapped_area()
--> calls file->f_op->get_unmapped_area(), i.e. hugetlb_get_unmapped_area()
--> hugetlb_get_unmapped_area() arch-specific
--> slice_get_unmapped_area() for Hash MMU
--> radix__hugetlb_get_unmapped_area() for Radix MMU
--> mmap_region()
--> setup vm_area_struct
--> performs vma merges, if possible
--> performs vma accounting
--> file-backed mapping, uses file->f_ops
--> call_mmap()
--> calls file->f_op->mmap(), i.e. hugetlbfs_file_mmap()
--> previously setup by hugetlb_file_setup()
--> pseudo file was created previously by hugetlb_file_setup()
--> hugetlbfs_file_mmap()
--> sets vma->vm_flags
--> sets VM_HUGETLB, i.e. this mapping will use hugetlb pages
--> sets VM_DONTEXPAND, i.e. this mapping cannot be extended with mremap()
--> sets vma->vm_ops
--> performs some checks
--> alignment requirements
--> vma length overflow
--> perform the actual reservation
--> hugetlb_reserve_pages()
--> inode = inode for "anon_hugepage" pseudo-file
--> from = vma->vm_pgoff >> huge_page_order(h)
--> to = len >> huge_page_shift(h)
--> if not VM_MAYSHARE
--> create new reservation map for the mapping
--> resv_map_alloc()
--> save the reservation map in vma metadata
--> set_vma_resv_map()
--> vma->vm_private_data points to the map
--> set_vmap_resv_flags()
--> sets HPAGE_RESV_OWNER, since MAP_PRIVATE
--> hugepage_subpool_get_pages()
--> check if subpool has enough pages
--> subpool stats maintained in struct hugepage_subpool
--> if spool->max_hpages is unset, perform maximum size accounting
--> check if (spool->used_pages + delta) <= spool->max_hpages
--> if spool->min_pages is unset but the subpool has reserved pages, perform minimum size accounting
--> ** kernel documentation says:
At mount time, the number of huge pages specified by min_size are reserved for
use by the filesystem. If there are not enough free huge pages available, the
mount will fail. As huge pages are allocated to the filesystem and freed, the
reserve count is adjusted so that the sum of allocated and reserved huge pages
is always at least min_size.
--> check if delta <= spool->rsv_pages
--> set spool->rsv_pages -= delta
--> return code = 0
--> else if delta > spool->rsv_pages
--> set spool->rsv_pages = 0
--> return code = delta - spool->rsv_pages, i.e. subpool is exhausted, need more pages
--> hugetlb_acct_memory()
--> if hugepage_subpool_get_pages() returns > 0, i.e. subpool was exhausted, need more pages
--> try and get more surplus pages
-------------------------------------------------------------------------------
Subpool Reservations
-------------------------------------------------------------------------------
# echo 1 > /sys/kernel/mm/hugepages/hugepages-16384kB/nr_hugepages
# mount -t hugetlbfs \
-o pagesize=16MB,size=64MB,min_size=16MB \
none <path-to-mount-point>
pagesize --> determines the huge page size to use for this
mount point
size --> determines the maximum allocatable memory for
use with this subpool, rounded down to nearest
multiple of the corresponding huge page size
min_size --> determines the minimum allocatable memory to
reserve for huge pages associated with this
subpool (mount point), rounded down to nearest
multiple of the corresponding huge page size,
affects 'resv_pages' and upper bound is limited
to 'nr_hugepages'
Confusing Accounting
--------------------
$ cat /sys/kernel/mm/hugepages/hugepages-16384kB/nr_hugepages
1
$ cat /sys/kernel/mm/hugepages/hugepages-16384kB/free_hugepages
1
$ cat /sys/kernel/mm/hugepages/hugepages-16384kB/resv_hugepages
0
$ cat /sys/kernel/mm/hugepages/hugepages-16384kB/nr_overcommit_hugepages
1
+------- affects resv_hugepages
| worth 2 huge pages
v
$ sudo mount -t hugetlbfs -o mode=01777,pagesize=16MB,min_size=32MB,size=64MB none hugemount
$ cat /sys/kernel/mm/hugepages/hugepages-16384kB/nr_hugepages
2
$ cat /sys/kernel/mm/hugepages/hugepages-16384kB/free_hugepages
2
$ cat /sys/kernel/mm/hugepages/hugepages-16384kB/resv_hugepages
2
$ cat /sys/kernel/mm/hugepages/hugepages-16384kB/nr_overcommit_hugepages
1
$ sudo umount hugemount
$ cat /sys/kernel/mm/hugepages/hugepages-16384kB/nr_hugepages
1
$ cat /sys/kernel/mm/hugepages/hugepages-16384kB/free_hugepages
1
$ cat /sys/kernel/mm/hugepages/hugepages-16384kB/resv_hugepages
0
$ cat /sys/kernel/mm/hugepages/hugepages-16384kB/nr_overcommit_hugepages
1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment