Skip to content

Instantly share code, notes, and snippets.

@faustinoaq
Last active July 8, 2024 02:26
Show Gist options
  • Save faustinoaq/d267102dd004651801c13fae9d7973ec to your computer and use it in GitHub Desktop.
Save faustinoaq/d267102dd004651801c13fae9d7973ec to your computer and use it in GitHub Desktop.
Recover ZFS boot disk in Ubuntu 22.04.4

Recovery

TL;DR: for some weird reason zfs was not being loaded in grub, maybe related to this bug?

ATTENTION: I copy pasted some things from my memory so, some things may be mistyped or wrong, proceed with caution or leave a comment ⚠️

insmod zfs
zfs-bootfs (hd1,gpt3) # or whatever is your structure

for me it was:

(hd1,gpt1) # boot fat filesystem for efi
(hd1,gpt2) # swap
(hd1,gpt3) # zfs boot pool (bpool) with linux kernel (vmlinuz) and initial ramdisk (initrd)
(hd1,gpt4) # zfs root pool (rpool) with ubuntu

And remember to zfs import -f in your pools if you tried to recover them using a live CD

Details

Ok I manage to fix my mess, now the hard part is documenting, here we go...

I was doing a backup of my pools (rpool and bpool) in my ZFS root

I used chatgpt to get the zfs send|receive commands (bad idea I know 🤣) so far it was good, But what chatgpt didn't know about was unexpected bugs and OMG I got a big one 🐛

It happens to be that in a grub version 2.06 is not compatible with booting from ZFS and having snapshots at same time

I followed chatgpt guidelines to get my backup of bpool and rpool and for that I had to do a snapshot.

Then I rebooted just in case, to see if my chatgpt guide was ok because I was afraid something may have broken my host setup, since I executed grub-install inside chroot to have a working backup:

And to my suprise after reboot, I got something like this:

I forgot to take pictures so I just got a similar from internet 😅

image

but is very similar just pointing to bpool and not finding the kernel:

Loading Linux 6.5.0-41-generic ...
error: file '/BOOT/ubuntu_0yfxim/vmlinuz-6.5.0-41-generic' not found
Loading initial ramdisk ...
error: you need to load the kernel first

After hours of troubleshooting I manage to boot:

Fixing it completely requires further investigation on why zfs module is not being loaded in grub

grub> ls
grub> ls (hd1,gpt4)
unknown filesystem
grub> insmod zfs
grub> zfs-bootfs (hd1,gpt3) # or whatever is your boot
grub> set root=(hd1,gpt4)
grub> ls (hd1,gpt3)/
@/  /BOOT
grub> ls (hd1,gpt3)/BOOT
ubuntu_0ynawf
grub> linux /BOOT/ubuntu_0ynawf/@/vmlinuz root=ZFS=rpool/ROOT/ubuntu_0ynawf boot=zfs
grub> initrd /BOOT/ubuntu_0ynawf/@/initrd.img
boot

And after boot I got in systemd emergency mode because I imported the pools in a live CD to try fix grub unsuccessfully

To fix it you need to force import them and CTRL+D to continue boot:

(initramfs) zfs import -f bpool
(initramfs) zfs import -f rpool
(initramfs) # zfs import -f <any other pool you may have>
(initramfs) # CTRL+D to continue boot

And done! for now my server is back online and have at least a guide on how to bring it back up in case my UPS battery goes bananas.

For the future I'll probably clean install ubuntu lts or debian.

Edit: I fixed it! ✨🥳🙌

Using a live CD I recreated the bpool following this OpenZFS guide

  1. chroot following this other OpenZFS guide
  2. Once you are inside chroot proceed to create a copy of boot cp -v /boot /home/user/boot
  3. Exit chroot and unmount pools
  4. Destroy and recreate the bpool
zpool destroy bpool
DISK=/dev/disk/by-id/scsi-SATA_disk1 # changes according to your disk model
zpool create \
    -o ashift=12 \
    -o autotrim=on \
    -o cachefile=/etc/zfs/zpool.cache \
    -o compatibility=grub2 \
    -o feature@livelist=enabled \
    -o feature@zpool_checkpoint=enabled \
    -O devices=off \
    -O acltype=posixacl -O xattr=sa \
    -O compression=lz4 \
    -O normalization=formD \
    -O relatime=on \
    -O canmount=off -O mountpoint=/boot -R /mnt \
    bpool ${DISK}-part3
zfs create -o canmount=off -o mountpoint=none bpool/BOOT
zfs create -o mountpoint=/boot bpool/BOOT/ubuntu_0ynawf

Once recreated follow the same step 1 to get in chroot again and copy back the data cp -v /home/user/boot/* /boot

And remember to create the folders /boot/efi and /boot/grub and always using the same ZFS UUID mine was 0ynawf

After this I finnally can reboot again without issues, if your boot gets in emergency mode just zfs import -f your pool back in

Some references that help me:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment