Below are the steps I used to recover the system, this is nixos running on ZFS on LUKS, where the LUKS volume is encrypted with a GPG key on a yubikey.
During a docker image build my system looked up and I had to reset my desktop by holding the power button. After that importing the zfs root pool hung on startup.
Some steps are from memory.
- Create a bootable USB drive with gpg, etc.: example
- Boot from USB.
# Mount the boot partition somewhere
mount /dev/nvmen0p1 /boot
mkdir work
cd work
cp /boot/EFI/nixos/7krk2ayvad1a9fq6xhn0v4mdb98ixdyr-initrd-linux-5.15.85-initrd.efi initrd.efi
cpio -i < initrd.efi
> cpio: kernel/x86/microcode/AuthenticAMD.bin: Cannot open: No such file or directory
> 100 blocks
dd if=initrd.efi of=rest.zst skip=100
nix-shell -p zstd
unzstd rest.zst
mkdir initramfs_filesystem
cpio -ivD initramfs_filesystem < rest
# find the encrypted LUKS passphrase
find -name cryptkey.gpg
./initramfs_filesystem/nix/store/52hqd9h29a7yqnywz1siv5g23gdybb3f-extra-utils/secrets/gpg-keys/dev/disk/by-uuid/f58d35dc-f624-4aec-b6ae-85ff28a565eb/cryptkey.gpg
cp ./initramfs_filesystem/nix/store/52hqd9h29a7yqnywz1siv5g23gdybb3f-extra-utils/secrets/gpg-keys/dev/disk/by-uuid/f58d35dc-f624-4aec-b6ae-85ff28a565eb/cryptkey.gpg ./
- Get public key on a system that has it:
gpg --export KEY_ID! > pub.key
, if you don’t know the key IDgpg --decrypt cryptkey.gpg
will show which key it’s encrypted with. Transfer
gpg --import /run/media/nixos/CRAP/pub.key
gpg --card-status
gpg --decrypt cryptkey.gpg > key.bin
# Find the right disk uuid.
sudo cryptsetup luksOpen /dev/disk/by-uuid/f58d35dc-f624-4aec-b6ae-85ff28a565eb cryptroot --key-file key.bin
rm key.bin
At this point I was able to mount the pool in readonly
mode, but mounting it r/w still hung
zpool import -o readonly=on tank
# mount partitions, and backup files if desired
zpool export tank
- Unclear why this works, but it did:
https://www.reddit.com/r/zfs/comments/fcacws/using_zdb_to_repair_errors/
echo 1 | sudo tee /sys/module/zfs/parameters/zfs_recover
sudo zdb -e -bcsvL tank # wait for hours ...
- Based on the comments, setting
zfs_recover
before importing should have the same effect. So runningzdb
is likely unnecessary.
The import during boot might just work now. To check:
zpool import -f tank
zpool scrub tank # supposedly a good idea
zpool export tank
Reboot back into system.