Skip to content

Instantly share code, notes, and snippets.

@dch
Last active August 10, 2018 01:44
Show Gist options
  • Save dch/3333118 to your computer and use it in GitHub Desktop.
Save dch/3333118 to your computer and use it in GitHub Desktop.
zfs notes

pool management

  • create a new pool by first creating an empty partition space using diskutil or Disk Utility
  • use cgdisk from homebrew if needed to reset the zpool partition id to a504 & then reboot

create a pool

zpool create -f \
  -o ashift=12 \
  -o failmode=continue \
  -O atime=off \
  -O compression=lz4 \
  -O checksum=sha256 \
  -R /zroot \
  zroot /dev/disk0s4
diskutil list disk0

NB for FreeBSD there is no ashift support in zfs, it is handled at the lower GEOM layer, so check the device capabilities, then lock the lower bound before zpool creation:

diskinfo -v /dev/da0
sysctl vfs.zfs.min_auto_ashift=12
echo "vfs.zfs.min_auto_ashift=12" >> /etc/sysctl.conf

adding a mirror to an existing pool

check current status

zpool status -v

root@akai / # zpool status -v
pool: tub
state: ONLINE
scrub: none requested
config:
NAME        STATE     READ WRITE CKSUM
tub         ONLINE       0     0     0
  disk0s2   ONLINE       0     0     0
errors: No known data errors

doublecheck device names for the intended mirror

root@akai / # diskutil list
/dev/disk0
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *512.1 GB   disk0
   1:                        EFI                         209.7 MB   disk0s1
   2:                        ZFS tub                     511.8 GB   disk0s2
/dev/disk1
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *121.3 GB   disk1
   1:                        EFI                         209.7 MB   disk1s1
   2:                  Apple_HFS akai                    120.5 GB   disk1s2
   3:                 Apple_Boot Recovery HD             650.0 MB   disk1s3
/dev/disk2
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *3.0 TB     disk2
   1:                        EFI                         209.7 MB   disk2s1
   2:          Apple_CoreStorage                         385.4 GB   disk2s2
   3:                 Apple_Boot Recovery HD             650.0 MB   disk2s3
   4:                        ZFS pond                    2.6 TB     disk2s4
/dev/disk3
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:                  Apple_HFS continuity             *385.1 GB   disk3

attach a new disk to an existing pool

root@akai / # zpool attach  tub disk0s2 disk2s4

check status and wait

root@akai / # zpool status -v
pool: tub
state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scrub: resilver in progress, 5.80% done, 1h17m to go
config:
NAME         STATE     READ WRITE CKSUM
tub          ONLINE       0     0     0
  mirror     ONLINE       0     0     0
    disk0s2  ONLINE       0     0     0
    disk2s4  ONLINE       0     0     0
errors: No known data errors

Split the mirror

confirm resilver has completed

root@akai / # zpool status -v

  pool: tub
 state: ONLINE
 scrub: resilver completed with 0 errors on Sun Jan 13 15:52:21 2013
config:
    NAME         STATE     READ WRITE CKSUM
    tub          ONLINE       0     0     0
      mirror     ONLINE       0     0     0
        disk0s2  ONLINE       0     0     0
        disk2s4  ONLINE       0     0     0
errors: No known data errors

flush RAM just in case

root@akai / # sync

take the additional mirror disk offline

root@akai / # zpool offline tub disk2s4

check status

root@akai / # zpool status -v
  pool: tub
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
    Sufficient replicas exist for the pool to continue functioning in a
    degraded state.
action: Online the device using 'zpool online' or replace the device with
    'zpool replace'.
 scrub: resilver completed with 0 errors on Sun Jan 13 15:52:21 2013
config:
    NAME         STATE     READ WRITE CKSUM
    tub          DEGRADED     0     0     0
      mirror     DEGRADED     0     0     0
        disk0s2  ONLINE       0     0     0
        disk2s4  OFFLINE      0     0     0
errors: No known data errors

remove the disk

root@akai / # zpool detach tub disk2s4

check status

root@akai / # zpool status -v
  pool: tub
 state: ONLINE
 scrub: resilver completed with 0 errors on Sun Jan 13 15:52:21 2013
config:
    NAME        STATE     READ WRITE CKSUM
    tub         ONLINE       0     0     0
      disk0s2   ONLINE       0     0     0
errors: No known data errors

scrub the original volume

root@akai / # zpool scrub tub

Removable Media

ZFS works successfully using >= 32 GiB SDXC cards in Feb 2011 MacBook Pro, and likely similar models.

  • use Finder to eject disks
  • if required, use zfs unmount -f <pool> & zfs pool export <pool> to force
  • if the physical ReadOnly switch is enabled on media, zfs will fail to import them with insufficient replicas as an error:
$ zpool import
  pool: builds
    id: 11121869171413038388
 state: FAULTED
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
devices and try again.
   see: http://www.sun.com/msg/ZFS-8000-3C
config:

builds      UNAVAIL  insufficient replicas
  disk2s2   UNAVAIL  cannot open

I have had 3 kernel panics during busy testing, none during data writing but all later after ejection in Finder without a subsequent pool export.

The oracle suggests an alternate approach for removable media usage.

Missing functionality

If you've used zfs elsewhere, or are referring to the manpages, a few minor things are missing:

  • zfs sharing and exporting doesn't work (iscsi, smb, nfs, afp via apple sharing)

using mbuffer for speed

  • set up the receiving (listening) end first

mbuffer -I 192.168.1.1:10000 -q -s128k -m1G -P10 | zfs recv storage/foo

  • now set up the sending end:

zfs send foo@052209 | mbuffer -q -s128k -m1G -O 192.168.1.2:10000

Snapshots

# duplicate everything for recovery purposes
zfs send -RLve zsource@snapshot | zfs recv -Fduv zroot

Managing ZFS filesystems

ZFS has an internal namespace (hierarchy) for filesystems, using a simple / delimiter within a filesystem name. Properties such as compression, mountpoints, and many other settings can be inherited through this namespace, or set and reset recursively. Other useful actions such as recursive snapshots are possible. Aligning these to roughly the same mapping as your filesystem will likely keep you sane & reduce your frustration.

  • Reset the mountpoints under pool "tub", filesystem "shared" to inherit from the root:
zfs inherit -r mountpoint tub/shared
  • Take snapshots of all subsidiary fileystems in pool "tub" and append the same suffix as snapshot name:
zfs snapshot -r tub@20120910
  • recursive, forced, rollback to a snapshot will destroy all intermediate snaps and clones:
sudo zfs rollback -rf <snapshot>

This zfs cheatsheet is worth printing out.

Creating OS X friendly datasets

The baseline

zfs create -o normalization=formD atime=off <name>

I set these on the base zfs dataset:

  • compression=lz4 because its pretty fast even if you don't need it
  • checksum=sha256 because it helps if you decide to use dedupe later
  • atime=off because it saves writes and more performant

Binding Finder.app to your will

Finder and friends like spotlight want to abuse your ZFS filesystems. In particular:

  • use mdutil -i off <mountpoint> to stop finder and spotlight trying to index ZFS. It won't work.
  • stop metadata being created using cd <mountpoint> ; mkdir .fseventsd && touch .fseventsd/no_log on the root mountpoint.
  • add FILESYSTEMS="hfs ufs zfs" to the end of /etc/locate.rc to allow locate to index zfs filesystems.
mdutil -i off /zfs
cd /zfs
mkdir .fseventsd && touch .fseventsd/no_log
touch .Trashes .metadata_never_index .apdisk

Use locate instead for non-realtime searching of your z filesystems.

Dropbox

Is a necessary evil these days. In fact, I started using zfs after two issues - firstly a nasty case of bitrot on photos, and later on, dropbox shitting on all my work and trashing 1000s of files while I was travelling. zfs rollback FTW.

sudo zfs create \
  -o normalization=formD \
  -o casesensitivity=insensitive \
  -o com.apple.mimic_hfs=on \
  zroot/users/dch/Dropbox                                                                                                                                                                                       

And follow the metadata tweaks above. Generally, it kinda works ok but sometimes gets stuck.

a zpool ramdisk with lz4 compression

Uses zsh functions but should be easy to re-write for any other shell:

zdisk() {zpool create -O compression=lz4 -fR /zram zram \
         `hdiutil attach -nomount ram://20971520`
         sudo chown -R $USER /zram
         cd /zram}
zdisk-destroy() {zpool export -f zram}

If you get a message about filesystem busy or dataset busy or similar, this script will find and release any remaining zfs locks:

#!/usr/bin/env perl
use Modern::Perl;

foreach my $fs (@ARGV) {
    my ($refcount, @refs, %holds);
    # get a hash of all subsidiary datasets with references
    say "Looking up userrefs for $fs...";
    @refs = qx(zfs get -H -r userrefs $fs)
        or die "$~\n$^E\n";
    foreach my $line (@refs) {
        next unless $line =~ m/
        (\S+)
        \s+
        userrefs
        \s+
        (\d+)
        /igx;
        unless ($2) {
            say "clean: $1";
            next;
        }
        else {
            say "dirty: $1";
            $holds{$1} = $2;
        }
    }
    say "DONE\n";

    say "Releasing holds...";
    foreach my $dataset (keys %holds) {
    	# prune holds recursively
        my @tags = qx(zfs holds -H -r $dataset)
            or die "$!\n$^E\n";
        foreach my $line (@tags) {
            next unless $line =~ m/
            (\S+)   # dataset - could be different for a subsidiary
            \s+
            (\.send-\S+)   # only prune tags left by `zfs send`
            /igx;
            my ($snapshot, $tag) = ($1, $2);

            print "Releasing $snapshot from $tag...";
            qx(zfs release -r $tag $snapshot)
                or die "$!\n$^E\n";
            say "ok"
        }
    }
    say "DONE\n";
}
exit 0;

Using a diskfile for a temporary mirror

$ zpool status                                                                                                                                                                                                                                                                                       ⏎
  pool: tank
 state: ONLINE
  scan: scrub repaired 0 in 15h54m with 0 errors on Sun May 25 15:47:45 2014
config:

	NAME        STATE     READ WRITE CKSUM
	tank         ONLINE       0     0     0
	  disk0s4   ONLINE       0     0     0

errors: No known data errors

  pool: tub
 state: ONLINE
  scan: scrub repaired 0 in 0h36m with 0 errors on Wed May 28 22:05:40 2014
config:

	NAME        STATE     READ WRITE CKSUM
	tub         ONLINE       0     0     0
	  disk4s2   ONLINE       0     0     0

errors: No known data errors
mkfile -n 480g mirror
import list
zpool import -f -R /mirror -N tub
zpool attach tub <existing-device> `pwd -P`/mirror
$ zpool status

  pool: tank
 state: ONLINE
  scan: scrub repaired 0 in 15h54m with 0 errors on Sun May 25 15:47:45 2014
config:

	NAME        STATE     READ WRITE CKSUM
	tank        ONLINE       0     0     0
	  disk0s4   ONLINE       0     0     0

errors: No known data errors

  pool: tub
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Mon Jun  2 13:09:33 2014
    453M scanned out of 349G at 22.7M/s, 4h22m to go
    450M resilvered, 0.13% done
config:

	NAME                                      STATE     READ WRITE CKSUM
	tub                                       ONLINE       0     0     0
	  mirror-0                                ONLINE       0     0     0
	    disk4s2                               ONLINE       0     0     0
	    /zfs/shared/backups/akai/tub          ONLINE       0     0     0  (resilvering)

errors: No known data errors

Importing a pool from a file device

# zpool import -d /Volumes/sprawl/

   pool: zpool
     id: 411387556460089843
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

	zpool                    ONLINE
	  /Volumes/sprawl/zpool  ONLINE

# zpool import -R /zpool -N -d /Volumes/sprawl zpool

Splitting and re-mirroring a FreeBSD server

You're doing this because for some reason your hoster doesn't yet support all features of ZFS in their rescue disk and you foolishly ran zpool upgrade <pool> and forgot something in your rc.conf script that prevents the system from completing boot to ssh.

boot to rescue system

  • use rescue_wintermute and get back your mfsbsd shell
alias l='/bin/ls -aFGhl'
mkdir -m 0700 /root/.ssh
fetch -o /root/.ssh/authorized_keys http://people.apache.org/~dch/authorized_keys
chmod 0400 /root/.ssh/authorized_keys
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment