Skip to content

Instantly share code, notes, and snippets.

@jeff-dagenais
Forked from dch/zfs_notes.md
Last active August 29, 2015 14:06
Show Gist options
  • Save jeff-dagenais/528cd429cef81c41e47a to your computer and use it in GitHub Desktop.
Save jeff-dagenais/528cd429cef81c41e47a to your computer and use it in GitHub Desktop.

pool management

  • create a new pool by first creating an empty partition space using diskutil or Disk Utility

adding a mirror to an existing pool

check current status

zpool status -v

root@akai / # zpool status -v
pool: tub
state: ONLINE
scrub: none requested
config:
NAME        STATE     READ WRITE CKSUM
tub         ONLINE       0     0     0
  disk0s2   ONLINE       0     0     0
errors: No known data errors

doublecheck device names for the intended mirror

root@akai / # diskutil list
/dev/disk0
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *512.1 GB   disk0
   1:                        EFI                         209.7 MB   disk0s1
   2:                        ZFS tub                     511.8 GB   disk0s2
/dev/disk1
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *121.3 GB   disk1
   1:                        EFI                         209.7 MB   disk1s1
   2:                  Apple_HFS akai                    120.5 GB   disk1s2
   3:                 Apple_Boot Recovery HD             650.0 MB   disk1s3
/dev/disk2
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *3.0 TB     disk2
   1:                        EFI                         209.7 MB   disk2s1
   2:          Apple_CoreStorage                         385.4 GB   disk2s2
   3:                 Apple_Boot Recovery HD             650.0 MB   disk2s3
   4:                        ZFS pond                    2.6 TB     disk2s4
/dev/disk3
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:                  Apple_HFS continuity             *385.1 GB   disk3

attach a new disk to an existing pool

root@akai / # zpool attach  tub disk0s2 disk2s4

check status and wait

root@akai / # zpool status -v
pool: tub
state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scrub: resilver in progress, 5.80% done, 1h17m to go
config:
NAME         STATE     READ WRITE CKSUM
tub          ONLINE       0     0     0
  mirror     ONLINE       0     0     0
    disk0s2  ONLINE       0     0     0
    disk2s4  ONLINE       0     0     0
errors: No known data errors

Split the mirror

confirm resilver has completed

root@akai / # zpool status -v

  pool: tub
 state: ONLINE
 scrub: resilver completed with 0 errors on Sun Jan 13 15:52:21 2013
config:
    NAME         STATE     READ WRITE CKSUM
    tub          ONLINE       0     0     0
      mirror     ONLINE       0     0     0
        disk0s2  ONLINE       0     0     0
        disk2s4  ONLINE       0     0     0
errors: No known data errors

flush RAM just in case

root@akai / # sync

take the additional mirror disk offline

root@akai / # zpool offline tub disk2s4

check status

root@akai / # zpool status -v
  pool: tub
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
    Sufficient replicas exist for the pool to continue functioning in a
    degraded state.
action: Online the device using 'zpool online' or replace the device with
    'zpool replace'.
 scrub: resilver completed with 0 errors on Sun Jan 13 15:52:21 2013
config:
    NAME         STATE     READ WRITE CKSUM
    tub          DEGRADED     0     0     0
      mirror     DEGRADED     0     0     0
        disk0s2  ONLINE       0     0     0
        disk2s4  OFFLINE      0     0     0
errors: No known data errors

remove the disk

root@akai / # zpool detach tub disk2s4

check status

root@akai / # zpool status -v
  pool: tub
 state: ONLINE
 scrub: resilver completed with 0 errors on Sun Jan 13 15:52:21 2013
config:
    NAME        STATE     READ WRITE CKSUM
    tub         ONLINE       0     0     0
      disk0s2   ONLINE       0     0     0
errors: No known data errors

scrub the original volume

root@akai / # zpool scrub tub

Removable Media

ZFS works successfully using >= 32 GiB SDXC cards in Feb 2011 MacBook Pro, and likely similar models.

  • use Finder to eject disks
  • if required, use zfs unmount -f <pool> & zfs pool export <pool> to force
  • if the physical ReadOnly switch is enabled on media, zfs will fail to import them with insufficient replicas as an error:
$ zpool import
  pool: builds
    id: 11121869171413038388
 state: FAULTED
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
devices and try again.
   see: http://www.sun.com/msg/ZFS-8000-3C
config:

builds      UNAVAIL  insufficient replicas
  disk2s2   UNAVAIL  cannot open

I have had 3 kernel panics during busy testing, none during data writing but all later after ejection in Finder without a subsequent pool export.

The oracle suggests an alternate approach for removable media usage.

Missing functionality

If you've used zfs elsewhere, or are referring to the manpages, a few critical things are missing:

  • recursive filesystem functionality is available to create snapshots, but not in zfs send/receive
  • zfs send/receive doesn't yet support pipes so zfs send <snap> | ssh user@host "zfs receive -d <snap>" doesn't work
  • zfs sharing and exporting doesn't work (iscsi, smb, nfs, afp via apple sharing)
  • recursive send/receive of snapshots is supported on other ZFS versions

zfs send in zfs-osx fork

using mbuffer for speed

  • set up the receiving (listening) end first

mbuffer -I 192.168.1.1:10000 -q -s128k -m1G -P10 | zfs recv storage/foo

  • now set up the sending end:

zfs send foo@052209 | mbuffer -q -s128k -m1G -O 192.168.1.2:10000

Snapshots

  • As noted, there is no support for streamed snapshots
  • Nor sending or receiving recursive snapshots
  • Finally, pipes/streams don't work as expected:
$ zfs send tub/vm.freebsd@20120910 | zfs receive orange/vm.freebsd

internal error: Bad file descriptor
cannot receive: invalid stream (failed to read first record)
[1]    8408 abort      zfs send tub/vm.freebsd@20120910 | 
       8409 exit 1     zfs receive orange/vm.freebsd

You can work around the missing stream functionality using fifos.

Snapshots on localhost

PIPE=/tmp/zpipe && mkfifo $PIPE
zfs send pool1/fs@snap > $PIPE &
zfs receive -d pool2/backups < $PIPE &

Snapshots between hosts

I couldn't get the canonical netcat example to work, so I used socat, which additionally allows compression and SSL as well. Make sure that you have chmod/chown your mountpoints correctly correctly, and zfs permissions granted to the appropriate non-root users to have this work.

This example uses socat to transfer the data between two hosts, listening on TCP port 1234. You can easily add SSL support by carefully following the client/server instructions provided.

On your source host:

mkfifo /tmp/zpipe
socat GOPEN:/tmp/zpipe TCP-LISTEN:1234,reuseaddr &
zfs send tub/test@initial > /tmp/zpipe

And on your target host:

ZHOST=source.host.com
mkfifo /tmp/zpipe
socat -u TCP:$ZHOST:1234,reuseaddr GOPEN:/tmp/zpipe &
zfs receive -F pool/test </tmp/zpipe

Sending an incremental snapshot

Using the same approach as above, with an updated zfs send command on your source host:

zfs send -i tub/test@initial tub/test@current  > /tmp/zpipe

The target host remains the same.

One-liners

Here's a one-liner to move data from a source pool into a backup filesystem on a different pool, on the same host:

# get set
SRC_POOL=inpool
DEST_POOL=outpool
DEST_ZFS=backup
REF_SNAP=20120801
NEW_SNAP=20120911
# send the initial snap
zfs send $SRC_POOL@$REF_SNAP > $PIPE &; zfs receive -d $DEST_POOL/$DEST_ZFS < $PIPE &
# update to a newer one
zfs send -i @REF_SNAP $SRC_POOL@$NEW_SNAP > $PIPE &; zfs receive -F -d $DEST_POOL/$DEST_ZFS < $PIPE &

Encrypted transfer

socat also supports secured communications using openssl. Follow the instructions to set up keys and certificates, and confirm that you can use the SSL connections correctly. Note that the socat "server" will be the sending ZFS end, and the zfs receiver the socat client.

On your source host:

mkfifo /tmp/zpipe
socat GOPEN:/tmp/zpipe openssl-listen:1234,reuseaddr,cert=/etc/socat/source.pem,cafile=/etc/socat/destination.crt &
zfs send tub/test@initial > /tmp/zpipe

And on your target host:

ZHOST=source.host.com
mkfifo /tmp/zpipe
socat -u openssl-connect:$ZHOST:1234,cert=/etc/socat/destination.pem,cafile=/etc/socat/source.crt GOPEN:/tmp/zpipe &
zfs receive -F pool/test </tmp/zpipe

Running it all from the server end

This works functionally but not all the shell wiggly bits actually complete successfully. It's still WIP with socat and friends. Also I seem to have mucked up the certificate stuff below, YMMV :-(

Sending the initial snapshot:

# setup
ZSOURCE=akai.local
ZDESTINATION=dch@continuity.local
ZFILE=tub/shared/repos
ZINITIAL=$ZFILE@20121220
ZCURRENT=$ZFILE@20130114
mkfifo /tmp/zpipe
ssh $ZSOURCE "mkfifo /tmp/zpipe"
# intitiate the sender 
socat GOPEN:/tmp/zpipe openssl-listen:1234,reuseaddr,cert=/etc/socat/source.pem,cafile=/etc/socat/destination.crt &
zfs send $ZINITIAL > /tmp/zpipe &
# control the destination over ssh
ssh $ZDESTINATION "socat -u openssl-connect:$ZHOST:1234,cert=/etc/socat/destination.pem,cafile=/etc/socat/source.crt GOPEN:/tmp/zpipe" & 
ssh $ZDESTINATION "zfs receive -F $ZINITIAL </tmp/zpipe" 

Sending the current snapshot:

# initiate the server
socat GOPEN:/tmp/zpipe openssl-listen:1234,reuseaddr,cert=/etc/socat/source.pem,cafile=/etc/socat/destination.crt &
zfs send -i $ZINITIAL $ZCURRENT > /tmp/zpipe &
# control the client over ssh
ssh $ZDESTINATION "socat -u openssl-connect:$ZHOST:1234,cert=/etc/socat/destination.pem,cafile=/etc/socat/source.crt GOPEN:/tmp/zpipe" & 
ssh $ZDESTINATION "zfs receive -F $ZFILE </tmp/zpipe"

Managing ZFS filesystems

ZFS has an internal namespace (hierarchy) for filesystems, using a simple / delimiter within a filesystem name. Properties such as compression, mountpoints, and many other settings can be inherited through this namespace, or set and reset recursively. Other useful actions such as recursive snapshots are possible. Aligning these to roughly the same mapping as your filesystem will likely keep you sane & reduce your frustration.

  • Reset the mountpoints under pool "tub", filesystem "shared" to inherit from the root:
zfs inherit -r mountpoint tub/shared
  • Take snapshots of all subsidiary fileystems in pool "tub" and append the same suffix as snapshot name:
zfs snapshot -r tub@20120910
  • recursive, forced, rollback to a snapshot will destroy all intermediate snaps and clones:
sudo zfs rollback -rf <snapshot>

This zfs cheatsheet is worth printing out.

Creating OS X friendly datasets

The baseline

zfs create -o normalization=formD atime=off <name>

I set these on the base zfs dataset:

  • compression=lz4 because its pretty fast even if you don't need it
  • checksum=fletcher4 because it helps if you decide to use dedupe later

Binding Finder.app to your will

Finder and friends like spotlight want to abuse your ZFS filesystems. In particular:

  • use mdutil -i off <mountpoint> to stop finder and spotlight trying to index ZFS. It won't work.
  • stop metadata being created using cd <mountpoint> ; mkdir .fseventsd && touch .fseventsd/no_log on the root mountpoint.
  • add FILESYSTEMS="hfs ufs zfs" to the end of /etc/locate.rc to allow locate to index zfs filesystems.
mdutil -i off /zfs
cd /zfs
mkdir .fseventsd && touch .fseventsd/no_log
touch .Trashes .metadata_never_index .apdisk

Use locate instead for non-realtime searching of your z filesystems.

a zpool ramdisk with lz4 compression

Uses zsh functions but should be easy to re-write for any other shell:

zdisk() {zpool create -O compression=lz4 -fR /zram zram \
         `hdiutil attach -nomount ram://20971520`
         sudo chown -R $USER /zram
         cd /zram}
zdisk-destroy() {zpool export -f zram}

If you get a message about filesystem busy or dataset busy or similar, this script will find and release any remaining zfs locks:

#!/usr/bin/env perl
use Modern::Perl;

foreach my $fs (@ARGV) {
    my ($refcount, @refs, %holds);
    # get a hash of all subsidiary datasets with references
    say "Looking up userrefs for $fs...";
    @refs = qx(zfs get -H -r userrefs $fs)
        or die "$~\n$^E\n";
    foreach my $line (@refs) {
        next unless $line =~ m/
        (\S+)
        \s+
        userrefs
        \s+
        (\d+)
        /igx;
        unless ($2) {
            say "clean: $1";
            next;
        }
        else {
            say "dirty: $1";
            $holds{$1} = $2;
        }
    }
    say "DONE\n";

    say "Releasing holds...";
    foreach my $dataset (keys %holds) {
    	# prune holds recursively
        my @tags = qx(zfs holds -H -r $dataset)
            or die "$!\n$^E\n";
        foreach my $line (@tags) {
            next unless $line =~ m/
            (\S+)   # dataset - could be different for a subsidiary
            \s+
            (\.send-\S+)   # only prune tags left by `zfs send`
            /igx;
            my ($snapshot, $tag) = ($1, $2);

            print "Releasing $snapshot from $tag...";
            qx(zfs release -r $tag $snapshot)
                or die "$!\n$^E\n";
            say "ok"
        }
    }
    say "DONE\n";
}
exit 0;

Using a diskfile for a temporary mirror

$ zpool status                                                                                                                                                                                                                                                                                       ⏎
  pool: tank
 state: ONLINE
  scan: scrub repaired 0 in 15h54m with 0 errors on Sun May 25 15:47:45 2014
config:

	NAME        STATE     READ WRITE CKSUM
	tank         ONLINE       0     0     0
	  disk0s4   ONLINE       0     0     0

errors: No known data errors

  pool: tub
 state: ONLINE
  scan: scrub repaired 0 in 0h36m with 0 errors on Wed May 28 22:05:40 2014
config:

	NAME        STATE     READ WRITE CKSUM
	tub         ONLINE       0     0     0
	  disk4s2   ONLINE       0     0     0

errors: No known data errors
mkfile -n 480g mirror
import list
zpool import -f -R /mirror -N tub
zpool attach tub <existing-device> `pwd -P`/mirror
$ zpool status

  pool: tank
 state: ONLINE
  scan: scrub repaired 0 in 15h54m with 0 errors on Sun May 25 15:47:45 2014
config:

	NAME        STATE     READ WRITE CKSUM
	tank        ONLINE       0     0     0
	  disk0s4   ONLINE       0     0     0

errors: No known data errors

  pool: tub
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Mon Jun  2 13:09:33 2014
    453M scanned out of 349G at 22.7M/s, 4h22m to go
    450M resilvered, 0.13% done
config:

	NAME                                      STATE     READ WRITE CKSUM
	tub                                       ONLINE       0     0     0
	  mirror-0                                ONLINE       0     0     0
	    disk4s2                               ONLINE       0     0     0
	    /zfs/shared/backups/akai/tub          ONLINE       0     0     0  (resilvering)

errors: No known data errors
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment