lvd2/.gitignore

## .gitignore
.*.swp

## vm_benchmarks.md

      
    Raw
  

              vm_benchmarks.md
            
          
    Preface

Recently I've noted that among all my VMs, debian based ones were taking
especially long time upgrading with apt upgrade. Specifically, time to
depack/configure each packet seemed to be too long. This observation came after
some time since I had started using BTRFS for my /home (where also all my VM
images are situated).
I also heard from the talks on official #btrfs irc channel that Btrfs is poor
for storing VM images. So I decided to make some benchmarking, that would
include Btrfs, classic Ext4 and widely-advertized OpenZFS. Also as a base level,
another measurement was made with VM image written directly on the bare disk.
Conditions

Host


Machine: AMD FX 8320 cpu (8 cores), 16 Gb DDR3 memory
OS: Ubuntu 20.04 running 5.11.0-37-generic kernel with ZFS module 2.0.2-1ubuntu5.1
Disk: Hitachi HDS721010CLA330 (1 Tb, 512 byte physical sectors, 7200 rpm, SATA 2.6 link at 3.0 Gb/s)
IO scheduler: a default mq-deadline (I've also tried bfq, but it didn't make
any noticeable difference)

Hypervisor


Hypervisor: qemu-kvm used through virt-manager.

Guest


Machine: 4 threads, 4Gb RAM, virtio-blk disk and virtio network.
OS: 64-bit Devuan
ascii
installed on 5Gib single-partition image formatted with Ext4
IO scheduler: 'none', the only available for virtio-blk disks.

Other conditions: during the tests, host machine was not running any cpu-, memory- or IO-intensive unrelated tasks.
Methodology

Basically, I was measuring time required to do apt dist-upgrade first from
ascii release to beowulf release, then to chimaera release. That time did not include download time, as
the download was performed with apt dist-upgrade -d before running the
upgrade.
That time was measured while the VM image was situated on different kinds of storage, specifically:

Plain disk
file on Ext4 on that disk
Nodatacow (+C) file on Btrfs on the same disk
Ordinary CoW file on Btrfs
Ordinary file on OpenZFS on the same disk
zvol block device on OpenZFS

Each FS was created with full default options (generally no parameters to
mkfs or zpool create), with the following exceptions:

Each FS was mounted with noatime (incl. OpenZFS)
Also for OpenZFS, there were also mountpoint=none for pool and
mountpoint=/some/path for the dataset.
Otherwise everything was the default, including OpenZFS's ashift, recordsize and
volblocksize.

For each storage type, test suite was run twice, first with
cache=writeback,aio=threads disk setup in qemu, then with
cache=none,aio=native to check also which caching type is preferable.
For each test, only single time measurement was done.
First I prepared a golden image with the installed guest OS like this:

install base system from devuan_ascii_2.1_amd64_dvd-1.iso
set up network and ssh access, check /etc/apt/sources.list to point to
ascii release
do apt update and apt upgrade
install vim, subversion, git and build-essential packets
do apt clean
shutdown the VM

For each test in question, the golden image was written anew to the specific
destination, after what the VM was run off that destination.
For plain disk tests and for OpenZFS zvol tests, it was done as dd if=golden.img of=/dev/device,
for non-CoW files that was dd if=golden.img of=working.img conv=notrunc (so that FS doesn't re-allocate file storage space while the contents is renewed).
For a CoW file, the FS was re-created anew for each run, while the image file was then simply copied.
The FS recreation also took place for OpenZFS zvol tests.
The test itself included the following steps:


boot the VM, change /etc/apt/sources.list to the beowulf


apt update, then apt dist-upgrade -d.


First measurement: execute time bash -c 'DEBIAN_FRONTEND=noninteractive apt dist-upgrade -y --force-yes -o Dpkg::Options::="--force-confold" ; sync'


after that, apt autoremove --purge, apt remove --purge linux-image-4.9.* (removing older kernel), apt clean, reboot.


Again change /etc/apt/sources.list to chimaera, then same as in (2)


Second measurement with exactly the same command as in (3)


Reboot to check whether upgrade was successful.


The reason for second measurement was to compare (on a CoW FSes) how badly the
image file could have fragmented after the first measurement and how that would
further influence the IO speed.
Results

The time is given in seconds, measured by the guest OS.


Disk image is on
qemu cache/aio
1st measurement, s
2nd measurement, s


Plain disk
wrback,threads
586
572


Plain disk
none,native
592
574


Ext4
wrback,threads
647
669


Ext4
none,native
596
580


Btrfs, +C
wrback,threads
1946
1854


Btrfs, +C
none,native
1954
1847


Btrfs, CoW
wrback,threads
2232
2128


Btrfs, CoW
none,native
2270
2529


OpenZFS file
wrback,threads
670
667


OpenZFS file
none,native
647
643


OpenZFS zvol
wrback,threads
618
607


OpenZFS zvol
none,native
615
603


Additional results

Inspired by the comments and ideas from Jiachen Yang (aka farseerfc), I've also did some more tests with non-default FS options.
mkfs.btrfs --mixed:


Disk image is on
qemu cache/aio
1st measurement, s
2nd measurement, s


Btrfs, +C
none,native
1703
1485


Btrfs, CoW
none,native
3476
5643


Default FSes on a small 15 GB partition


Disk image is on
qemu cache/aio
1st measurement, s
2nd measurement, s


Btrfs, +C
none,native
1761
1607


Btrfs, CoW
none,native
2177
2299


OpenZFS zvol
none,native
603
578


space_cache=v2


Disk image is on
qemu cache/aio
1st measurement, s
2nd measurement, s


Btrfs, +C
none,native
1798
1612


Btrfs, CoW
none,native
2008
2048


Conclusion

First I'd like to note that the measurements mostly experienced synchronous writes of apt. If it's run under eatmydata tool, measured times tend to be around ~200 s regardless of FS.
We can see that Btrfs CoW file with a VM image gives the worst performance, rapidly degrading over time and should not be used in practice.
Another disappointment is the nodatacow (+C) performance which turned to be ~3x slower than Ext4 or OpenZFS counterparts.
space_cache=v2 gives some noticeable performance boost, however the performance is still not even remotely as good as it is expected to be.
Disk image is on	qemu cache/aio	1st measurement, s	2nd measurement, s
Plain disk	wrback,threads	586	572
Plain disk	none,native	592	574
Ext4	wrback,threads	647	669
Ext4	none,native	596	580
Btrfs, +C	wrback,threads	1946	1854
Btrfs, +C	none,native	1954	1847
Btrfs, CoW	wrback,threads	2232	2128
Btrfs, CoW	none,native	2270	2529
OpenZFS file	wrback,threads	670	667
OpenZFS file	none,native	647	643
OpenZFS zvol	wrback,threads	618	607
OpenZFS zvol	none,native	615	603
Disk image is on	qemu cache/aio	1st measurement, s	2nd measurement, s
Btrfs, +C	none,native	1761	1607
Btrfs, CoW	none,native	2177	2299
OpenZFS zvol	none,native	603	578