Skip to content

Instantly share code, notes, and snippets.

@sergey-cheperis
Last active January 7, 2022 17:20
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sergey-cheperis/2ae9fdfa1ecf0e7d747f4b4d26b8e9d4 to your computer and use it in GitHub Desktop.
Save sergey-cheperis/2ae9fdfa1ecf0e7d747f4b4d26b8e9d4 to your computer and use it in GitHub Desktop.
Resolve slow startup of a KVM guest with 480 GiB RAM by preallocating huge pages

Issue

A guest with 480 GiB RAM on a 512 GiB machine is taking several minutes to start because memory pre-allocation is not fast enough (approx. 2 GiB/sec).

Solution

The solution is to have the necessary amount of RAM preallocated in the huge page pool, and to make QEMU/KVM use the pool for the guest.

  1. Check which page size is supported by the platform:
if [ "$(cat /proc/cpuinfo | grep -oh pse | uniq)" = "pse" ]
    then echo "2048K = OK"
    else echo "2048K = NO"
fi
if [ "$(cat /proc/cpuinfo | grep -oh pdpe1gb | uniq)" = "pdpe1gb" ]
    then echo "1G = OK"
    else echo "1G = NO"
fi
  1. Add kernel args to use and preallocate necessary amount of huge pages (in this case, 480 GiB): hugepagesz=1G default_hugepagesz=1G hugepages=480.

    • On Proxmox, add them to /etc/kernel/cmdline and run pve-efiboot-tool refresh.
    • On Ubuntu, add to GRUB_CMDLINE_LINUX_DEFAULT variable in /etc/default/grub, then run update-grub.
    • Note: while this can be done on the fly by writing a value to /proc/sys/vm/nr_hugepages, it's likely to take a long time or to fail to allocate the requested amount of pages. Meanwhile, the kernel arg will take effect at boot while the memory is not fragmented thus the allocation will be done within seconds.
  2. Mount HugeTLB filesystem:

    • On Proxmox, add hugetlbfs /dev/hugepages hugetlbfs mode=01770 0 0 to /etc/fstab.
    • On Ubuntu, this is not needed as it's already mounted (check with mount | grep hugetlbfs).
  3. Configure QEMU/KVM to use 1GiB hugepages:

  4. On NUMA systems, huge pages allocated by the kernel are distributed equally between nodes. VMs bound to NUMA nodes should be configured not to exceed per-node memory allocation.

Sources

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment