Skip to content

Instantly share code, notes, and snippets.

@lvd2
Last active November 25, 2021 01:55
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save lvd2/7abcc958034e60729c8c10098a22970d to your computer and use it in GitHub Desktop.
Save lvd2/7abcc958034e60729c8c10098a22970d to your computer and use it in GitHub Desktop.
Naïve testing of FS performance to store VM images

Preface

Recently I've noted that among all my VMs, debian based ones were taking especially long time upgrading with apt upgrade. Specifically, time to depack/configure each packet seemed to be too long. This observation came after some time since I had started using BTRFS for my /home (where also all my VM images are situated).

I also heard from the talks on official #btrfs irc channel that Btrfs is poor for storing VM images. So I decided to make some benchmarking, that would include Btrfs, classic Ext4 and widely-advertized OpenZFS. Also as a base level, another measurement was made with VM image written directly on the bare disk.

Conditions

Host

  • Machine: AMD FX 8320 cpu (8 cores), 16 Gb DDR3 memory
  • OS: Ubuntu 20.04 running 5.11.0-37-generic kernel with ZFS module 2.0.2-1ubuntu5.1
  • Disk: Hitachi HDS721010CLA330 (1 Tb, 512 byte physical sectors, 7200 rpm, SATA 2.6 link at 3.0 Gb/s)
  • IO scheduler: a default mq-deadline (I've also tried bfq, but it didn't make any noticeable difference)

Hypervisor

  • Hypervisor: qemu-kvm used through virt-manager.

Guest

  • Machine: 4 threads, 4Gb RAM, virtio-blk disk and virtio network.
  • OS: 64-bit Devuan ascii installed on 5Gib single-partition image formatted with Ext4
  • IO scheduler: 'none', the only available for virtio-blk disks.

Other conditions: during the tests, host machine was not running any cpu-, memory- or IO-intensive unrelated tasks.

Methodology

Basically, I was measuring time required to do apt dist-upgrade first from ascii release to beowulf release, then to chimaera release. That time did not include download time, as the download was performed with apt dist-upgrade -d before running the upgrade.

That time was measured while the VM image was situated on different kinds of storage, specifically:

  • Plain disk
  • file on Ext4 on that disk
  • Nodatacow (+C) file on Btrfs on the same disk
  • Ordinary CoW file on Btrfs
  • Ordinary file on OpenZFS on the same disk
  • zvol block device on OpenZFS

Each FS was created with full default options (generally no parameters to mkfs or zpool create), with the following exceptions:

  • Each FS was mounted with noatime (incl. OpenZFS)
  • Also for OpenZFS, there were also mountpoint=none for pool and mountpoint=/some/path for the dataset. Otherwise everything was the default, including OpenZFS's ashift, recordsize and volblocksize.

For each storage type, test suite was run twice, first with cache=writeback,aio=threads disk setup in qemu, then with cache=none,aio=native to check also which caching type is preferable.

For each test, only single time measurement was done.

First I prepared a golden image with the installed guest OS like this:

  • install base system from devuan_ascii_2.1_amd64_dvd-1.iso
  • set up network and ssh access, check /etc/apt/sources.list to point to ascii release
  • do apt update and apt upgrade
  • install vim, subversion, git and build-essential packets
  • do apt clean
  • shutdown the VM

For each test in question, the golden image was written anew to the specific destination, after what the VM was run off that destination.

For plain disk tests and for OpenZFS zvol tests, it was done as dd if=golden.img of=/dev/device, for non-CoW files that was dd if=golden.img of=working.img conv=notrunc (so that FS doesn't re-allocate file storage space while the contents is renewed). For a CoW file, the FS was re-created anew for each run, while the image file was then simply copied. The FS recreation also took place for OpenZFS zvol tests.

The test itself included the following steps:

  1. boot the VM, change /etc/apt/sources.list to the beowulf

  2. apt update, then apt dist-upgrade -d.

  3. First measurement: execute time bash -c 'DEBIAN_FRONTEND=noninteractive apt dist-upgrade -y --force-yes -o Dpkg::Options::="--force-confold" ; sync'

  4. after that, apt autoremove --purge, apt remove --purge linux-image-4.9.* (removing older kernel), apt clean, reboot.

  5. Again change /etc/apt/sources.list to chimaera, then same as in (2)

  6. Second measurement with exactly the same command as in (3)

  7. Reboot to check whether upgrade was successful.

The reason for second measurement was to compare (on a CoW FSes) how badly the image file could have fragmented after the first measurement and how that would further influence the IO speed.

Results

The time is given in seconds, measured by the guest OS.

Disk image is on qemu cache/aio 1st measurement, s 2nd measurement, s
Plain disk wrback,threads 586 572
Plain disk none,native 592 574
Ext4 wrback,threads 647 669
Ext4 none,native 596 580
Btrfs, +C wrback,threads 1946 1854
Btrfs, +C none,native 1954 1847
Btrfs, CoW wrback,threads 2232 2128
Btrfs, CoW none,native 2270 2529
OpenZFS file wrback,threads 670 667
OpenZFS file none,native 647 643
OpenZFS zvol wrback,threads 618 607
OpenZFS zvol none,native 615 603

Additional results

Inspired by the comments and ideas from Jiachen Yang (aka farseerfc), I've also did some more tests with non-default FS options.

mkfs.btrfs --mixed:

Disk image is on qemu cache/aio 1st measurement, s 2nd measurement, s
Btrfs, +C none,native 1703 1485
Btrfs, CoW none,native 3476 5643

Default FSes on a small 15 GB partition

Disk image is on qemu cache/aio 1st measurement, s 2nd measurement, s
Btrfs, +C none,native 1761 1607
Btrfs, CoW none,native 2177 2299
OpenZFS zvol none,native 603 578

space_cache=v2

Disk image is on qemu cache/aio 1st measurement, s 2nd measurement, s
Btrfs, +C none,native 1798 1612
Btrfs, CoW none,native 2008 2048

Conclusion

First I'd like to note that the measurements mostly experienced synchronous writes of apt. If it's run under eatmydata tool, measured times tend to be around ~200 s regardless of FS.

We can see that Btrfs CoW file with a VM image gives the worst performance, rapidly degrading over time and should not be used in practice.

Another disappointment is the nodatacow (+C) performance which turned to be ~3x slower than Ext4 or OpenZFS counterparts.

space_cache=v2 gives some noticeable performance boost, however the performance is still not even remotely as good as it is expected to be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment