Skip to content

Instantly share code, notes, and snippets.

@mgerdts
Last active January 3, 2019 17:27
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mgerdts/ddef77c8822b5616d8840bcab6477873 to your computer and use it in GitHub Desktop.
Save mgerdts/ddef77c8822b5616d8840bcab6477873 to your computer and use it in GitHub Desktop.
zfs-raw-recv-panic

The panic is seen on Linux. Screenshots of the panic: 1 2 3

I built ZoL on xubuntu 18.04 with this commit at the tip.

I suspect the clones difference between @7 and @8 is due to the interrupted receive.

root@xubuntu:~# zfs get -o property,value all rpool/crypto/vol0@7  > /tmp/7
root@xubuntu:~# zfs get -o property,value all rpool/crypto/vol0@8  > /tmp/8
root@xubuntu:~# sdiff  /tmp/7 /tmp/8
PROPERTY              VALUE					PROPERTY              VALUE
type                  snapshot					type                  snapshot
creation              Thu Dec  6 17:28 2018			creation              Thu Dec  6 17:28 2018
used                  144K				      |	used                  0B
referenced            40.6G					referenced            40.6G
compressratio         1.00x					compressratio         1.00x
volsize               40G					volsize               40G
createtxg             142241				      |	createtxg             142245
guid                  8896052298089227989		      |	guid                  3280422635076051013
primarycache          all					primarycache          all
secondarycache        all					secondarycache        all
defer_destroy         off					defer_destroy         off
userrefs              0						userrefs              0
objsetid              429				      |	objsetid              234
mlslabel              none					mlslabel              none
refcompressratio      1.00x					refcompressratio      1.00x
written               210M				      |	written               238M
clones                					      |	clones                rpool/crypto/vol0/%recv
logicalreferenced     40.3G					logicalreferenced     40.3G
context               none					context               none
fscontext             none					fscontext             none
defcontext            none					defcontext            none
rootcontext           none					rootcontext           none
encryption            aes-256-ccm				encryption            aes-256-ccm
encryptionroot        rpool/crypto/vol0				encryptionroot        rpool/crypto/vol0
keystatus             available					keystatus             available

After a reboot, the clones value on @8 was unset.

The streams are interesting. Something odd (but not entirely unique) about the stream that causes the panic is that it does not use raw send.

root@xubuntu:/atrest/tmp-old# for file in zfs.??; do zstreamdump < $file > $file.zstreamdump; done
root@xubuntu:/atrest/tmp-old# egrep -e 'features|toname' *.zst*
zfs.00.zstreamdump:	features = 1420000            <<< initial stream with raw send
zfs.00.zstreamdump:	toname = zones/crypto/vol0@0
zfs.02.zstreamdump:	features = 0                  <<< beginning of compound stream containing @1 and @2
zfs.02.zstreamdump:	toname = zones/crypto/vol0@2
zfs.02.zstreamdump:	features = 420000             <<< cooked send of @1
zfs.02.zstreamdump:	toname = zones/crypto/vol0@1
zfs.02.zstreamdump:	features = 420000             <<< cooked send of @2
zfs.02.zstreamdump:	toname = zones/crypto/vol0@2
zfs.06.zstreamdump:	features = 0                  <<< beginning of compound stream containing @3 .. @6
zfs.06.zstreamdump:	toname = zones/crypto/vol0@6
zfs.06.zstreamdump:	features = 1420000            <<< raw send of @3
zfs.06.zstreamdump:	toname = zones/crypto/vol0@3
zfs.06.zstreamdump:	features = 1420000            <<< raw send of @4
zfs.06.zstreamdump:	toname = zones/crypto/vol0@4
zfs.06.zstreamdump:	features = 1420000
zfs.06.zstreamdump:	toname = zones/crypto/vol0@5
zfs.06.zstreamdump:	features = 1420000
zfs.06.zstreamdump:	toname = zones/crypto/vol0@6
zfs.08.zstreamdump:	features = 0                  <<< beginning of compound stream containing @7 .. @8
zfs.08.zstreamdump:	toname = zones/crypto/vol0@8
zfs.08.zstreamdump:	features = 1420000            <<< raw send of @7
zfs.08.zstreamdump:	toname = zones/crypto/vol0@7
zfs.08.zstreamdump:	features = 1420000
zfs.08.zstreamdump:	toname = zones/crypto/vol0@8
zfs.13.zstreamdump:	features = 0                  <<< beginning of compound stream containing @9 .. @13
zfs.13.zstreamdump:	toname = zones/crypto/vol0@13
zfs.13.zstreamdump:	features = 420000             <<< cooked send of @9.  Panic while receiving this.
zfs.13.zstreamdump:	toname = zones/crypto/vol0@9
zfs.13.zstreamdump:	features = 420000
zfs.13.zstreamdump:	toname = zones/crypto/vol0@10
zfs.13.zstreamdump:	features = 420000
zfs.13.zstreamdump:	toname = zones/crypto/vol0@11
zfs.13.zstreamdump:	features = 420000
zfs.13.zstreamdump:	toname = zones/crypto/vol0@12
zfs.13.zstreamdump:	features = 420000
zfs.13.zstreamdump:	toname = zones/crypto/vol0@13

This does call into question the idea that it is a problem only with raw send - the stream being received is not a raw stream.

I went to the source system and generated a raw stream containing only @9.

# zfs send --raw -i zones/crypto/old-vol0@{8,9} > /var/tmp/mg-vol0.8-9.raw

That was able to be received just fine, but oddly I was not able to send a cooked stream.

root@xubuntu:/atrest/tmp-old# zfs send -I rpool/crypto/vol0@{7,8} > mg-vol0.8-9.cooked
warning: cannot send 'rpool/crypto/vol0@8': Input/output error

With strace we can see that it is indeed the send that is giving EIO.

ioctl(3, _IOC(0, 0x5a, 0x12, 0), 0x7ffe9bb16ef0) = 0		<<< ZFS_IOC_BASE + 18 == ZFS_IOC_OBJSET_STATS
ioctl(3, _IOC(0, 0x5a, 0x15, 0), 0x7ffe9bb138a0) = 0		<<< ZFS_IOC_BASE + 21 == ZFS_IOC_SNAPSHOT_LIST_NEXT
ioctl(3, _IOC(0, 0x5a, 0x15, 0), 0x7ffe9bb138a0) = 0
ioctl(3, _IOC(0, 0x5a, 0x15, 0), 0x7ffe9bb138a0) = 0
ioctl(3, _IOC(0, 0x5a, 0x15, 0), 0x7ffe9bb138a0) = 0
ioctl(3, _IOC(0, 0x5a, 0x15, 0), 0x7ffe9bb138a0) = 0
ioctl(3, _IOC(0, 0x5a, 0x15, 0), 0x7ffe9bb138a0) = 0
ioctl(3, _IOC(0, 0x5a, 0x15, 0), 0x7ffe9bb138a0) = 0
ioctl(3, _IOC(0, 0x5a, 0x15, 0), 0x7ffe9bb138a0) = 0
ioctl(3, _IOC(0, 0x5a, 0x15, 0), 0x7ffe9bb138a0) = 0
ioctl(3, _IOC(0, 0x5a, 0x15, 0), 0x7ffe9bb138a0) = 0
ioctl(3, _IOC(0, 0x5a, 0x15, 0), 0x7ffe9bb138a0) = -1 ESRCH (No such process)
ioctl(3, _IOC(0, 0x5a, 0x1c, 0), 0x7ffe9bb12eb0) = -1 EIO (Input/output error)	<<< ZFS_IOC_BASE + 28 == ZFS_IOC_SEND

Perhaps Linux hasn't seen this problem because it forces you do a raw send. The same send with --raw seems to work just fine. More likely, something just isn't working right. The order of commands that I ran after reboot were:

root@xubuntu:/atrest/tmp-old# zfs receive rpool/crypto/vol0 < mg-vol0.8-9.raw
cannot receive incremental stream: destination 'rpool/crypto/vol0' does not exist
root@xubuntu:/atrest/tmp-old# zfs send -i rpool/crypto/vol0@{8,9} > mg-vol0.8-9.cooked
warning: cannot send 'rpool/crypto/vol0@9': source key must be loaded
root@xubuntu:/atrest/tmp-old# /sbin/zfs load-key -r rpool/crypto/vol0
Enter passphrase for 'rpool/crypto/vol0':
1 / 1 key(s) successfully loaded
root@xubuntu:/atrest/tmp-old# zfs send -i rpool/crypto/vol0@{8,9} > mg-vol0.8-9.cooked
warning: cannot send 'rpool/crypto/vol0@9': Input/output error

I also get EIO trying to send any other full or incremental snapshot without --raw. I suspect the EIO comes from this in module/zfs/arc.c.

2273 /*
2274  * If this function is being called to decrypt an encrypted buffer or verify an
2275  * authenticated one, the key must be loaded and a mapping must be made
2276  * available in the keystore via spa_keystore_create_mapping() or one of its
2277  * callers.
2278  */
2279 int
2280 arc_untransform(arc_buf_t *buf, spa_t *spa, const zbookmark_phys_t *zb,
2281     boolean_t in_place)
2282 {
2283         int ret;
2284         arc_fill_flags_t flags = 0;
2285
2286         if (in_place)
2287                 flags |= ARC_FILL_IN_PLACE;
2288
2289         ret = arc_buf_fill(buf, spa, zb, flags);
2290         if (ret == ECKSUM) {
2291                 /*
2292                  * Convert authentication and decryption errors to EIO
2293                  * (and generate an ereport) before leaving the ARC.
2294                  */
2295                 ret = SET_ERROR(EIO);
2296                 spa_log_error(spa, zb);
2297                 zfs_ereport_post(FM_EREPORT_ZFS_AUTHENTICATION,
2298                     spa, NULL, zb, NULL, 0, 0);
2299         }
2300
2301         return (ret);
2302 }

The stack would be:

arc_untransform
dmu_send_impl
dmu_send_obj
zfs_ioc_send

I get the same results after reboot, zfs load-key, zfs send (without --raw). 🤷

In case it's a surprise to anyone - once the system panics/oopses (even if it doesn't halt, wtf!!!???), you really shouldn't expect it to work properly. I found that writes that happen after a panic may or may not be durable. Who knows if the durable writes are writes of the proper data. Shudder.

I went back to the source system and generated a cooked send that contained only @9. That stream looks like:

root@xubuntu:/atrest/tmp-old# zstreamdump < mg-vol0.8-9.cooked
BEGIN record
	hdrtype = 1
	features = 0
	magic = 2f5bacbac
	creation_time = 5c09b0f3
	type = 3
	flags = 0x4
	toguid = 5881abe8b4b7ae47
	fromguid = 2d86670194dbe045
	toname = zones/crypto/old-vol0@9
END checksum = 68eef5e53b424df/29dc89cee50806fe/9621258bcb04e6e8/fbf13773611af858
SUMMARY:
	Total DRR_BEGIN records = 1
	Total DRR_END records = 1
	Total DRR_OBJECT records = 2
	Total DRR_FREEOBJECTS records = 4
	Total DRR_WRITE records = 106880
	Total DRR_WRITE_BYREF records = 0
	Total DRR_WRITE_EMBEDDED records = 0
	Total DRR_FREE records = 4
	Total DRR_SPILL records = 0
	Total records = 106892
	Total write size = 875560960 (0x34300000)
	Total stream length = 908911264 (0x362ce2a0)

I was able to receive that stream once, then the next time it panicked. When I tried to reproduce, I couldn't. Sigh. I was able to reproduce with zfs.13.

I noticed oddities while testing this that deserve more attention.

panic doesn't

My initial panic actually led to a panic. The console was locked up and could no longer scroll. That's great, except this was on a laptop that lacks a serial port. Investigation led me to understand that console over USB serial ports is not a thing. I then configured kdump (install the package, take the default answers). Rebooted, then reproduced. Now panics did not lock up the console, seemingly because they were turned into oopses.

This allowed me to scroll on the console to get the full stack. But it also had the unsettling quality of allowing existing ssh sessions to contine, oblivious to the doom in other sessions. Writes to zfs file systems seemed successful but on reboot some (but not all) were missing. For example, I had shell history that recorded things that I did (e.g. zstreamdump > $file) but other evidence that I did those things was missing. Most unsettling was that a snapshot created during that period existed but did not contain the expected data. That is, I did:

cd /atrest/tmp-old
for file in zfs.??; do zstreamdump < $file > $file.zstreamdump
... minutes pass
zfs snapshot rpool/atrest@2

On reboot the *.zstreamdump files were missing. The snapshot existed but did not contain the files.

ZoL should not be turning panics into oopses. It could be that kdump with zfs root just went really poorly. But then Linux should fail in a safe manner, not one that leads to data loss.

zfs recv -F < rawstream does not roll back snapshots

Normally zfs recv -F < stream can be repeated. The -F effectively means "roll back to fromsnap before receiving." This happens with non-raw (cooked?) streams, but not with raw streams.

/dev/zvol can become stale

I had two encrypted zvols: rpool/crypto/mg1 and rpool/crypto/vol0. Neither key was loaded automatically after a reboot. I manually keyed mg1 (or maybe it was I created mg1). I then wrote to /dev/zvol/rpool/crypto/mg1 a few times. I did a bunch of snapshots, sends, and receives of that volume. I then loaded the key for vol0. Its device node never appeared under /dev/zvol. Renaming the volume also did not cause it to appear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment