Skip to content

Instantly share code, notes, and snippets.

@islander
Last active February 6, 2024 18:03
Show Gist options
  • Star 17 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save islander/8517685e3a9f7e0e1d458033710f0893 to your computer and use it in GitHub Desktop.
Save islander/8517685e3a9f7e0e1d458033710f0893 to your computer and use it in GitHub Desktop.
Recover a qcow2 image using fsck

Recover a qcow2 image using fsck

Load network block device module:

# modprobe nbd max_part=8

Poweroff machine:

# virsh destroy virtual-machine

Connect disk image:

# qemu-nbd --connect=/dev/nbd0 /var/lib/libvirt/images/virtual-machine.qcow2

Check disk:

# fsck /dev/nbd0p1
fsck from util-linux 2.25.2
e2fsck 1.42.12 (29-Aug-2014)
/dev/nbd0p1: recovering journal
/dev/nbd0p1 contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Inodes that were part of a corrupted orphan linked list found.  Fix<y>? yes
Inode 274 was part of the orphaned inode list.  FIXED.
Inode 132276 was part of the orphaned inode list.  FIXED.
Deleted inode 142248 has zero dtime.  Fix<y>? yes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences:  -603674 -623174 +(689342--689343)
Fix<y>? yes
Free blocks count wrong for group #18 (15076, counted=15077).
Fix<y>? yes
Free blocks count wrong for group #19 (11674, counted=11675).
Fix<y>? yes
Free blocks count wrong (632938, counted=670871).
Fix<y>? yes
Inode bitmap differences:  -274 -132276 -142248
Fix<y>? yes
Free inodes count wrong for group #0 (52, counted=53).
Fix<y>? yes
Free inodes count wrong for group #16 (99, counted=100).
Fix<y>? yes
Free inodes count wrong for group #17 (519, counted=520).
Fix<y>? yes
Free inodes count wrong (204392, counted=204599).
Fix<y>? yes

/dev/nbd0p1: ***** FILE SYSTEM WAS MODIFIED *****
/dev/nbd0p1: 101833/306432 files (0.2% non-contiguous), 553321/1224192 blocks

Disconnect device:

# qemu-nbd --disconnect /dev/nbd0
/dev/nbd0 disconnected

Start machine:

# virsh start virtual-machine

@ryt15
Copy link

ryt15 commented Dec 25, 2023

If the above steps still don't work, and you need to save your files from the VM, you might be able to do so by mounting the guest partitions to the host. Below is an example where partition 5 is mounted to /mnt. It's assumed that you have already loaded the Network Block Device (nbd) into the kernel like this:
$ sudo modprobe nbd max_part=8

Backup files from VM guest to host

Make sure the VM is shut off:

$ virsh list --all
 Id   Name               State
-----------------------------------
 -    Kelpie             shut off
 -    Papillon           shut off
 -    Corgi              shut off

Connect the file system (as above):

$ sudo qemu-nbd --connect=/dev/nbd0 /var/lib/libvirt/images/bad-virtual-machine.qcow2

Check that the /mnt directory is free to use as mount point:

$ mount | grep /mnt

No output should be seen from above command. If it does, either unmount /mnt or use another empty directory.

$ sudo mount /dev/nbd0p5 /mnt
$ ls /mnt
bin    dev   lib    libx32      mnt   root  snap      sys  var
boot   etc   lib32  lost+found  opt   run   srv       tmp
cdrom  home  lib64  media       proc  sbin  swapfile  usr

You hopefully see a similar result as above, which means you can access all files on this VM partition and copy them to a safe place.

Make space

Take the opportunity to check free disk space on all devices. Use the df command:
$ df /mnt
If the Use% column shows 90% or more, remove unnecessary files. Also empty any trashes.

When done, don't forget to unmount the partition as follows:

$ sudo umount /dev/nbd0p5
And disconnect:
$ sudo qemu-nbd --disconnect /dev/nbd0

Repair broken VM

It might be possible to repair your broken system. It comes with a warning:
If you're not careful, and mix up the devices on the host with the devices on the guest, you may destroy the host!
First a quick review of what we're up to:
We assume that the boot device is corrupt, so we want to replace it with a fresh one from another virtual machine.

With the nbd device disconnected, run lsblk on the host. It should show something like this (I always skip the loop devices)

NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda           8:0    0 931,5G  0 disk 
└─sda1        8:1    0 931,5G  0 part /data
nvme0n1     259:0    0 931,5G  0 disk 
├─nvme0n1p1 259:1    0   512M  0 part /boot/efi
└─nvme0n1p2 259:2    0   931G  0 part /

Now write down the name of these devices, to make sure you never involve them in any further operations since they belong to your host, which you don't want to alter!

Connect the guest device again:
$ sudo qemu-nbd --connect=/dev/nbd0 /var/lib/libvirt/images/bad-virtual-machine.qcow2

And run lsblk again. You should see some new devices like this:

nbd0         43:0    0    40G  0 disk
├─nbd0p1     43:1    0   512M  0 part
├─nbd0p2     43:2    0     1K  0 part
├─nbd0p3     43:3    0    15G  0 part
└─nbd0p5     43:5    0  24,5G  0 part

We can already now assume that nbd0p1 is the troublemaker, since it has a size of 512M.
You can run sudo parted -l. Check the part under Disk /dev/nbd0. It should look something like this:

Disk /dev/nbd0: 42.9GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags:

Number  Start   End     Size    Type      File system  Flags
 1      1049kB  538MB   537MB   primary   fat32        boot
 2      539MB   26.8GB  26.3GB  extended
 5      539MB   26.8GB  26.3GB  logical   ext4
 3      26.8GB  42.9GB  16.1GB  primary   ext4

I suggest you first make a binary backup of nbd0p1 like this:
In this example I will copy it to my ~/tmp directory under a new directory I call boots:

$ cd ~/tmp
$ mkdir boots
$ cd boots
$ dd if=/dev/nbd0p1 of=bad_guest_nbd0p1
1048576+0 records in
1048576+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 1.07071 s, 501 MB/s

Disconnect the faulty VM

$ sudo qemu-nbd --disconnect /dev/nbd0
/dev/nbd0 disconnected

Connect a fresh working VM of the same type and version as the damaged one. If you don't have one, create it and test it, and shut it down. In this example I assume you have one at path /var/lib/libvirt/images/ubuntu20.04.qcow2
Connect as follows:
$ sudo qemu-nbd --connect=/dev/nbd0 /var/lib/libvirt/images/ubuntu20.04.qcow2
Make a binary copy of its boot device:

$ dd if=/dev/nbd0p1 of=ok_guest_nbd0p1
1048576+0 records in
1048576+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 1.06321 s, 505 MB/s

Disconnect the fresh VM:
$ sudo qemu-nbd --disconnect /dev/nbd0

At this point, we have a backup of the bad boot devices in ~/tmp/boots/bad_guest_nbd0p1, and a working one as ~/tmp/boots/ok_guest_nbd0p1.

Connect the broken VM:
$ sudo qemu-nbd --connect=/dev/nbd0 /var/lib/libvirt/images/bad-virtual-machine.qcow2
But don't mount it!

Overwrite the bad partition with the backup from the fresh one. But first make sure you're not involving any partitions of your host machine!

$ sudo dd if=ok_guest_nbd0p1 of=/dev/nbd0p1
1048576+0 records in
1048576+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 15,3951 s, 34,9 MB/s

And disconnect:

$ sudo qemu-nbd --disconnect /dev/nbd0
/dev/nbd0 disconnected

Try to boot the faulty VM. It may seem to hang, but wait at least 90 seconds. If it still hangs, try to send the F1 key or force reset.

Once you have successfully booted it, open a terminal (in the guest), and run sudo mount -a.
If you see an error like this
mount: /boot/efi: can't find UUID=AF74-0D3D
then first run sudo blkid | grep vfat
/dev/vda1: UUID="7289-BC8F" TYPE="vfat" PARTUUID="b443b22b-01"
and then grep efi /etc/fstab
UUID=AF74-0D3D /boot/efi vfat umask=0077 0 1
If the UUID values differ, edit /etc/fstab and change the UUID of /boot/efi to the value shown by blkid.

It should be workning ok now. Reboot your VM.

@agiUnderground
Copy link

Thank you! That was really helpful!

Although damaged root partition was 1M in size and recovery partition was about 100M, so I was unable to write it. I guess it's something related to COW and how one disk based on another one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment