Skip to content

Instantly share code, notes, and snippets.

@mgerdts
Last active August 27, 2018 18:50
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mgerdts/d3f9ef0ddd648a4062454f43e02f106d to your computer and use it in GitHub Desktop.
Save mgerdts/d3f9ef0ddd648a4062454f43e02f106d to your computer and use it in GitHub Desktop.
MBO-5 notes

Snapshot Space Requirements

Snapshot space accounting is tricky. This document demonstrates that.

Space required to take a snapshot

If there is not enough available space to meet the snapshot's space requirements, zfs snapshot [-r] <dataset>@<snapname> will fail with ENSOPC. Simple, right? Not so much.

What is available space?

It is relatively easy to determine how much space is available. This shows that 990 MiB is available.

[root@buglets ~]# zfs list -o name,available zones/2925dec4-ba6d-cb5a-c41e-84c7e3c08d3e
NAME                                        AVAIL
zones/2925dec4-ba6d-cb5a-c41e-84c7e3c08d3e   990M

To be more precise, use the -p option.

[root@buglets ~]# zfs list -Hpo available zones/2925dec4-ba6d-cb5a-c41e-84c7e3c08d3e
1038022144

The amount of space available to a zone may be less than the amount available in the pool.

[root@buglets ~]# zfs list -o name,available zones
NAME   AVAIL
zones  16.1G

Available space must not be conflated with free space. A pool may have lots of free space, but it may not be available to use due redundancy (mirrors, raidz) or it may be reserved through reservation or refreservation properties.

[root@buglets ~]# zpool get free zones
NAME   PROPERTY  VALUE  SOURCE
zones  free      49.9G  -

However, when refquota is smaller than quota, you should be aware that available will never be larger than refquota. This results in under reporting the amount of space that is available to descendants.

How much space does a snapshot require?

For a non-recursive snapshot, the amount of space required is the mount of space referenced only by the dataset that is being snapshotted. Simple enough? Nope! This is really thorny.

XXX Maybe not so thorny. See written property. Still, if we are planning for consistent behavior, relying on this may not be the best plan.

Let's simulate a bhyve zone's hierarchy, initially ignoring the fact that the boot disk should be a clone.

[root@buglets ~]# zfs create zones/a
[root@buglets ~]# zfs create -V 100m zones/a/vol
[root@buglets ~]# zfs list -r -t all -o name,volsize,used,refer,quota,refquota,reserv,refreserv,used,usedds,usedsnap,usedrefreserv,available zones/a
NAME         VOLSIZE   USED  REFER  QUOTA  REFQUOTA  RESERV  REFRESERV   USED  USEDDS  USEDSNAP  USEDREFRESERV  AVAIL
zones/a            -   105M    23K   none      none    none       none   105M     23K         0              0  15.9G
zones/a/vol     100M   105M    12K      -         -    none       105M   105M     12K         0           105M  16.0G

Initially zones/a/vol references 12 KiB, which is all metadata. When a snapshot is taken, the snapshot will refer to 12 KiB, but the snapshot uses no space (used is zero).

[root@buglets ~]# zfs snapshot zones/a/vol@snap1
[root@buglets ~]# zfs list -r -t all -o name,volsize,used,refer,quota,refquota,reserv,refreserv,used,usedds,usedsnap,usedrefreserv,available zones/a
NAME               VOLSIZE   USED  REFER  QUOTA  REFQUOTA  RESERV  REFRESERV   USED  USEDDS  USEDSNAP  USEDREFRESERV  AVAIL
zones/a                  -   105M    23K   none      none    none       none   105M     23K         0              0  15.9G
zones/a/vol           100M   105M    12K      -         -    none       105M   105M     12K         0           105M  16.0G
zones/a/vol@snap1        -      0    12K      -         -       -          -      0       -         -              -      -

Let's write 50 MiB of non-zero data to the volume and try the snapshot again.

[root@buglets ~]# zfs destroy zones/a/vol@snap1
[root@buglets ~]# openssl rand $(( 1024 * 1024 * 50 )) > /dev/zvol/rdsk/zones/a/vol
[root@buglets ~]# zfs snapshot zones/a/vol@snap1
[root@buglets ~]# zfs list -r -t all -o name,volsize,used,refer,quota,refquota,reserv,refreserv,used,usedds,usedsnap,usedrefreserv,available zones/a
NAME               VOLSIZE   USED  REFER  QUOTA  REFQUOTA  RESERV  REFRESERV   USED  USEDDS  USEDSNAP  USEDREFRESERV  AVAIL
zones/a                  -   156M    23K   none      none    none       none   156M     23K         0              0  15.9G
zones/a/vol           100M   156M  50.5M      -         -    none       105M   156M   50.5M         0           105M  16.0G
zones/a/vol@snap1        -      0  50.5M      -         -       -          -      0       -         -              -      -

Again, the snapshot refers to the same amount of data as the the volume and uses no space on its own.

If we overwrite that data, the snapshot consumes a bunch of space.

[root@buglets ~]# zfs list -r -t all -o name,volsize,used,refer,quota,refquota,reserv,refreserv,used,usedds,usedsnap,usedrefreserv,available zones/a
NAME               VOLSIZE   USED  REFER  QUOTA  REFQUOTA  RESERV  REFRESERV   USED  USEDDS  USEDSNAP  USEDREFRESERV  AVAIL
zones/a                  -   156M    23K   none      none    none       none   156M     23K         0              0  15.9G
zones/a/vol           100M   156M  50.5M      -         -    none       105M   156M   50.5M     50.5M          54.7M  15.9G
zones/a/vol@snap1        -  50.5M  50.5M      -         -       -          -  50.5M       -         -              -      -

Let's try that again with a quota. First we clean up what was done above.

[root@buglets ~]# zfs destroy zones/a/vol@snap1
[root@buglets ~]# zfs list -r -t all -o name,volsize,used,refer,quota,refquota,reserv,refreserv,used,usedds,usedsnap,usedrefreserv,available zones/a
NAME         VOLSIZE   USED  REFER  QUOTA  REFQUOTA  RESERV  REFRESERV   USED  USEDDS  USEDSNAP  USEDREFRESERV  AVAIL
zones/a            -   105M    23K   none      none    none       none   105M     23K         0              0  15.9G
zones/a/vol     100M   105M  50.5M      -         -    none       105M   105M   50.5M         0          54.7M  16.0G

A bhyve zone uses refquota to restrict the amount of data that can be written to the zone's dataset, ignoring its descendent snapshots and volumes. By setting refquota, we've reduced the amount of space that the filesystem can use.

[root@buglets ~]# zfs set refquota=50m zones/a
[root@buglets ~]# zfs list -r -t all -o name,volsize,used,refer,quota,refquota,reserv,refreserv,used,usedds,usedsnap,usedrefreserv,available zones/a
NAME         VOLSIZE   USED  REFER  QUOTA  REFQUOTA  RESERV  REFRESERV   USED  USEDDS  USEDSNAP  USEDREFRESERV  AVAIL
zones/a            -   105M    23K   none       50M    none       none   105M     23K         0              0  50.0M
zones/a/vol     100M   105M  50.5M      -         -    none       105M   105M   50.5M         0          54.7M  16.0G

Now, zones/a can only use 50 MiB, but zones/a/vol can still use up to 16 GiB. Let's set a limit on the amount of space that zone's entire hierarchy can use by setting a quota on zones/a.

[root@buglets ~]# zfs list -r -t all -o name,volsize,used,refer,quota,refquota,reserv,refreserv,used,usedds,usedsnap,usedrefreserv,available zones/a
NAME         VOLSIZE   USED  REFER  QUOTA  REFQUOTA  RESERV  REFRESERV   USED  USEDDS  USEDSNAP  USEDREFRESERV  AVAIL
zones/a            -   105M    23K   150M       50M    none       none   105M     23K         0              0  44.7M
zones/a/vol     100M   105M  50.5M      -         -    none       105M   105M   50.5M         0          54.7M  99.5M

Now there's 44.7 MiB available and the volume references 50.5 MiB. That's not enough space for a snapshot.

[root@buglets ~]# zfs snapshot zones/a/vol@snap1
cannot create snapshot 'zones/a/vol@snap1': out of space

To test the assertion that this is caused by 44.7 MiB available being less than the 50.5 MiB used, let's increase the quota. To simplify things, we'll get the refquota out of the way for a minute to make the available column meaningful.

[root@buglets ~]# zfs set refquota=none zones/a
[root@buglets ~]# zfs set quota=156m zones/a
[root@buglets ~]# zfs list -r -t all -o name,volsize,used,refer,quota,refquota,reserv,refreserv,used,usedds,usedsnap,usedrefreserv,available zones/a
NAME         VOLSIZE   USED  REFER  QUOTA  REFQUOTA  RESERV  REFRESERV   USED  USEDDS  USEDSNAP  USEDREFRESERV  AVAIL
zones/a            -   105M    23K   156M      none    none       none   105M     23K         0              0  50.7M
zones/a/vol     100M   105M  50.5M      -         -    none       105M   105M   50.5M         0          54.7M   105M

Now a snapshot can be created, with 224 KiB (0.2 MiB) to spare.

[root@buglets ~]# zfs snapshot zones/a/vol@snap1
[root@buglets ~]# zfs list -r -t all -o name,volsize,used,refer,quota,refquota,reserv,refreserv,used,usedds,usedsnap,usedrefreserv,available zones/a
NAME               VOLSIZE   USED  REFER  QUOTA  REFQUOTA  RESERV  REFRESERV   USED  USEDDS  USEDSNAP  USEDREFRESERV  AVAIL
zones/a                  -   156M    23K   156M      none    none       none   156M     23K         0              0   224K
zones/a/vol           100M   156M  50.5M      -         -    none       105M   156M   50.5M         0           105M   105M
zones/a/vol@snap1        -      0  50.5M      -         -       -          -      0       -         -              -      -

To maximize confusion, we can create another snapshot now too. This is because both snapshots refer to the blocks.

[root@buglets ~]# zfs snapshot zones/a/vol@snap2
[root@buglets ~]# zfs list -r -t all -o name,volsize,used,refer,quota,refquota,reserv,refreserv,used,usedds,usedsnap,usedrefreserv,available zones/a
NAME               VOLSIZE   USED  REFER  QUOTA  REFQUOTA  RESERV  REFRESERV   USED  USEDDS  USEDSNAP  USEDREFRESERV  AVAIL
zones/a                  -   156M    23K   156M      none    none       none   156M     23K         0              0   224K
zones/a/vol           100M   156M  50.5M      -         -    none       105M   156M   50.5M         0           105M   105M
zones/a/vol@snap1        -      0  50.5M      -         -       -          -      0       -         -              -      -
zones/a/vol@snap2        -      0  50.5M      -         -       -          -      0       -         -              -      -

While it is really nice that there is room for this second snapshot, there is no means to predict whether there is enough space from user space.

Let's get rid of the second snapshot and try writing some data to the volume.

[root@buglets ~]# openssl rand $(( 1024 * 1024 * 50 )) > /dev/zvol/rdsk/zones/a/vol
[root@buglets ~]# zfs list -r -t all -o name,volsize,used,refer,quota,refquota,reserv,refreserv,used,usedds,usedsnap,usedrefreserv,available zones/a
NAME               VOLSIZE   USED  REFER  QUOTA  REFQUOTA  RESERV  REFRESERV   USED  USEDDS  USEDSNAP  USEDREFRESERV  AVAIL
zones/a                  -   156M    23K   156M      none    none       none   156M     23K         0              0   224K
zones/a/vol           100M   156M  50.5M      -         -    none       105M   156M   50.5M     50.5M          54.7M  55.0M
zones/a/vol@snap1        -  50.5M  50.5M      -         -       -          -  50.5M       -         -              -      -

Notice that this write caused the usedrefresrv (used by reference reservation) to decrease by the amount (subject to rounding) that usedsnap (used by snapshots) increased, keeping the value of used constant. In fact, we can do a full overwrite of the volume.

[root@buglets ~]# openssl rand $(( 1024 * 1024 * 100 )) > /dev/zvol/rdsk/zones/a/vol
[root@buglets ~]# zfs list -r -t all -o name,volsize,used,refer,quota,refquota,reserv,refreserv,used,usedds,usedsnap,usedrefreserv,available zones/a
NAME               VOLSIZE   USED  REFER  QUOTA  REFQUOTA  RESERV  REFRESERV   USED  USEDDS  USEDSNAP  USEDREFRESERV  AVAIL
zones/a                  -   156M    23K   156M      none    none       none   156M     23K         0              0   224K
zones/a/vol           100M   156M   101M      -         -    none       105M   156M    101M     50.5M          4.23M  4.45M
zones/a/vol@snap1        -  50.5M  50.5M      -         -       -          -  50.5M       -         -              -      -

Even though the volume was fully overwritten, there is still some space available. This is because the zfs refreservation=auto value accounts for raidz variants that are not as efficient at storing metadata. To ensure that volumes can be sent to other pools, it is best to not optimize this slop away.

A more practical example

Now let's look at a scenario with a boot disk, a data disk, and [ref]quota properties that are representative of what a typical guest would look like.

[root@buglets ~]# zfs create -o refquota=10m -o quota=220m zones/b
[root@buglets ~]# zfs create -V 100m zones/b/boot
[root@buglets ~]# zfs create -V 100m zones/b/data
[root@buglets ~]# zfs list -r -t all -o name,volsize,used,refer,quota,refquota,reserv,refreserv,used,usedds,usedsnap,usedrefreserv,available zones/b
NAME          VOLSIZE   USED  REFER  QUOTA  REFQUOTA  RESERV  REFRESERV   USED  USEDDS  USEDSNAP  USEDREFRESERV  AVAIL
zones/b             -   211M    23K   220M       10M    none       none   211M     23K         0              0  9.48M
zones/b/boot     100M   105M    12K      -         -    none       105M   105M     12K         0           105M   115M
zones/b/data     100M   105M    12K      -         -    none       105M   105M     12K         0           105M   115M

When we create a snapshot, we will want a recursive snapshot that captures all disks in a single transaction group. There may be some configuration data stored in /zones/b/config that should come along with this snapshot. Before we write to anything, we can create such a snapshot:

[root@buglets ~]# zfs snapshot -r zones/b@snap1
[root@buglets ~]# zfs list -r -t all -o name,volsize,used,refer,quota,refquota,reserv,refreserv,used,usedds,usedsnap,usedrefreserv,available zones/b
NAME                VOLSIZE   USED  REFER  QUOTA  REFQUOTA  RESERV  REFRESERV   USED  USEDDS  USEDSNAP  USEDREFRESERV  AVAIL
zones/b                   -   211M    23K   220M       10M    none       none   211M     23K         0              0  9.45M
zones/b@snap1             -      0    23K      -         -       -          -      0       -         -              -      -
zones/b/boot           100M   105M    12K      -         -    none       105M   105M     12K         0           105M   115M
zones/b/boot@snap1        -      0    12K      -         -       -          -      0       -         -              -      -
zones/b/data           100M   105M    12K      -         -    none       105M   105M     12K         0           105M   115M
zones/b/data@snap1        -      0    12K      -         -       -          -      0       -         -              -      -

However, writing just a little bit of data (5% of total allocated to disks) makes it so that we can't create a snapshot.

[root@buglets ~]# zfs destroy -r zones/b@snap1
[root@buglets ~]# openssl rand $(( 1024 * 1024 * 10 )) > /dev/zvol/rdsk/zones/b/data
[root@buglets ~]# zfs list -r -t all -o name,volsize,used,refer,quota,refquota,reserv,refreserv,used,usedds,usedsnap,usedrefreserv,available zones/b
NAME          VOLSIZE   USED  REFER  QUOTA  REFQUOTA  RESERV  REFRESERV   USED  USEDDS  USEDSNAP  USEDREFRESERV  AVAIL
zones/b             -   211M    23K   220M       10M    none       none   211M     23K         0              0  9.48M
zones/b/boot     100M   105M    12K      -         -    none       105M   105M     12K         0           105M   115M
zones/b/data     100M   105M  10.1M      -         -    none       105M   105M   10.1M         0          95.1M   105M
[root@buglets ~]# zfs snapshot -r zones/b@snap1
cannot create snapshot 'zones/b/data@snap1': out of space
no snapshots were created

Take away

Taken to the extreme, if both disks were fully written, the package would need to have more than double the size of the two vdisks to be able to create a snapshot. Assuming quota is the means by which we will ensure that customers don't go over the space reserved by their allocation, they will need to use only half the space advertised by the package to ensure that snapshots are possible.

This is likely to drive the need for temporarily moving to a larger package.

Problem: no refreservation for non-zvol data

Since we don't reserve space equal to the refquota on the zone's dataset, it is possible to run out of space in the pool or within the zone's dataset prior to reaching that refquota.

[root@buglets ~]# zfs destroy -r zones/b
[root@buglets ~]# zfs create -o refquota=10m -o quota=220m zones/b
[root@buglets ~]# zfs create -V 100m zones/b/boot
[root@buglets ~]# zfs create -V 100m zones/b/data
[root@buglets ~]# zfs list -r -t all -o name,volsize,used,refer,quota,refquota,reserv,refreserv,used,usedds,usedsnap,usedrefreserv,available zones/b
NAME          VOLSIZE   USED  REFER  QUOTA  REFQUOTA  RESERV  REFRESERV   USED  USEDDS  USEDSNAP  USEDREFRESERV  AVAIL
zones/b             -   211M    23K   220M       10M    none       none   211M     23K         0              0  9.48M
zones/b/boot     100M   105M    12K      -         -    none       105M   105M     12K         0           105M   115M
zones/b/data     100M   105M    12K      -         -    none       105M   105M     12K         0           105M   115M
[root@buglets ~]# openssl rand $(( 1024 * 1024 * 9 )) > /dev/zvol/rdsk/zones/b/data
[root@buglets ~]# zfs snapshot -r zones/b@snap1
[root@buglets ~]# zfs list -r -t all -o name,volsize,used,refer,quota,refquota,reserv,refreserv,used,usedds,usedsnap,usedrefreserv,available zones/b
NAME                VOLSIZE   USED  REFER  QUOTA  REFQUOTA  RESERV  REFRESERV   USED  USEDDS  USEDSNAP  USEDREFRESERV  AVAIL
zones/b                   -   220M    23K   220M       10M    none       none   220M     23K         0              0   372K
zones/b@snap1             -      0    23K      -         -       -          -      0       -         -              -      -
zones/b/boot           100M   105M    12K      -         -    none       105M   105M     12K         0           105M   106M
zones/b/boot@snap1        -      0    12K      -         -       -          -      0       -         -              -      -
zones/b/data           100M   114M  9.10M      -         -    none       105M   114M   9.10M         0           105M   106M
zones/b/data@snap1        -      0  9.10M      -         -       -          -      0       -         -              -      -
[root@buglets ~]# openssl rand $(( 1024 * 1024 )) > /zones/b/file1
4274973380:error:02012031:system library:fflush:Disc quota exceeded:bss_file.c:434:fflush()
4274973380:error:20074002:BIO routines:FILE_CTRL:system lib:bss_file.c:436:
[root@buglets ~]# ls -ld /zones/b/file1
-rw-r--r--   1 root     root      393216 Jul 16 22:32 /zones/b/file1

This becomes more of an issue when snapshots are used, as snapshots will try to make use of this unreserved space.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment