Storage |
Emergency |
Redundant Boot |
vmem LVM |
ZFS |
|||||
---|---|---|---|---|---|---|---|---|---|
EFI |
/boot |
LVM |
LV |
LV |
Storage |
Cache |
Log |
||
Unuse |
1 |
||||||||
SSD #1 |
500M |
500M |
|
||||||
SSD #2 |
500M |
500M |
|
||||||
NVMe |
swap |
app |
|||||||
SSD / HDD |
|
||||||||
NVMe |
[check] |
||||||||
NVMe |
10G PLP |
Storage |
Emergency |
Redundant Boot |
vmem LVM |
ZFS |
|||||
---|---|---|---|---|---|---|---|---|---|
EFI |
/boot |
LVM |
LV |
LV |
Storage |
Cache |
Log |
||
Unuse |
1 |
||||||||
NVMe #1 |
500M |
500M |
|
swap |
app |
[check] |
|||
NVMe #2 |
500M |
500M |
|
10G PLP |
|||||
SSD / HDD |
|
This is an example of machine configured with:
-
EFI Partition in mdadm RAID1
-
Boot Partition in mdadm RAID1
-
Root partition in Thinly Provisioned LVM volume
Tip
|
Attempt to use UUID in configuration files whenever possible |
blkid /dev/md126: UUID="B895-E0BC" TYPE="vfat" /dev/nvme0n1p1: UUID="02190f3f-386e-3c26-fe11-da342ee44207" UUID_SUB="e6abd74f-e861-4cca-de2b-b0371a7fe964" LABEL="fb3-a2:boot_efi" TYPE="linux_raid_member" PARTLABEL="Linux filesystem" PARTUUID="ba833d9e-2e05-45b3-bb6f-65a19b838594" /dev/nvme0n1p2: UUID="0e424203-d116-836a-4311-45b0d1eac1b5" UUID_SUB="b4ed317f-38e6-c1ec-4873-a48833dba0fc" LABEL="fb3-a2:boot" TYPE="linux_raid_member" PARTLABEL="Linux filesystem" PARTUUID="3bc08e6b-641c-451b-8a2a-007e6e05e7e4" /dev/md127: UUID="7584555f-a08a-411f-a663-043d18041d07" TYPE="xfs" /dev/nvme1n1p2: UUID="0e424203-d116-836a-4311-45b0d1eac1b5" UUID_SUB="b256fed4-7a00-cc77-570e-233e44ba1369" LABEL="fb3-a2:boot" TYPE="linux_raid_member" PARTLABEL="Linux filesystem" PARTUUID="40ae1e9d-c486-45cc-8a75-e24849e242aa" /dev/nvme1n1p1: UUID="02190f3f-386e-3c26-fe11-da342ee44207" UUID_SUB="00e1724a-f755-d3f1-4f3c-4d3a4bf016ff" LABEL="fb3-a2:boot_efi" TYPE="linux_raid_member" PARTLABEL="Linux filesystem" PARTUUID="aeb6561b-9571-4c19-b21b-9afac2c88e53" /dev/mapper/system-root: UUID="f0ae0fdc-bec3-477c-8a79-dbdac5f0358d" TYPE="xfs"
UUID=f0ae0fdc-bec3-477c-8a79-dbdac5f0358d / xfs defaults 0 0 UUID=7584555f-a08a-411f-a663-043d18041d07 /boot xfs defaults 0 0 UUID=B895-E0BC /boot/efi vfat umask=0077,shortname=winnt 0 2
GRUB_TIMEOUT=5 GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)" GRUB_DEFAULT=saved GRUB_DISABLE_SUBMENU=true GRUB_TERMINAL_OUTPUT="console" GRUB_CMDLINE_LINUX="crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M rd.lvm.lv=system/root rd.md.uuid=0e424203:d116836a:431145b0:d1eac1b5" GRUB_DISABLE_RECOVERY="true" GRUB_ENABLE_BLSCFG=true GRUB_DEVICE_UUID=f0ae0fdc-bec3-477c-8a79-dbdac5f0358d
Note
|
Kernel parameters rd.lvm.lv cannot support UUID.
|
Note
|
Add GRUB_DEVICE_UUID as root file system UUID |
search --no-floppy --fs-uuid --set=root --hint='mduuid/0e424203d116836a431145b0d1eac1b5' 7584555f-a08a-411f-a663-043d18041d07 ... search --no-floppy --fs-uuid --set=root 7584555f-a08a-411f-a663-043d18041d07 ... set kernelopts="root=/dev/mapper/system-root ro crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M rd.lvm.lv=system/root rd.md.uuid=0e424203:d116836a:431145b0:d1eac1b5 " ...
Warning
|
bug #64291: grub2-probe fail to get fs_uuid of LVM thin volume |
efibootmgr -v BootCurrent: 0001 Timeout: 1 seconds BootOrder: 0001,0000,0020,0021,0005,0014,0015,0016,0017 Boot0000* Rocky Linux HD(1,GPT,ba833d9e-2e05-45b3-bb6f-65a19b838594,0x100,0x1f500)/File(\EFI\ROCKY\SHIMX64.EFI) Boot0001* Rocky Linux HD(1,GPT, aeb6561b-9571-4c19-b21b-9afac2c88e53,0x100,0x1f500)/File(\EFI\ROCKY\SHIMX64.EFI) Boot0005* UEFI: Built-in EFI Shell VenMedia(5023b95c-db26-429b-a648-bd47664c8012)..BO Boot0014 UEFI: PXE IP4 P1 Intel(R) I210 Gigabit Network Connection PciRoot(0x0)/Pci(0x1,0x2)/Pci(0x0,0x0)/Pci(0x4,0x0)/Pci(0x0,0x0)/MAC(d05099de9976,0)/IPv4(0.0.0.00.0.0.0,0,0)..BO Boot0015 UEFI: PXE IP6 P1 Intel(R) I210 Gigabit Network Connection PciRoot(0x0)/Pci(0x1,0x2)/Pci(0x0,0x0)/Pci(0x4,0x0)/Pci(0x0,0x0)/MAC(d05099de9976,0)/IPv6([::]:<->[::]:,0,0)..BO Boot0016 UEFI: PXE IP4 P0 Intel(R) I210 Gigabit Network Connection PciRoot(0x0)/Pci(0x1,0x2)/Pci(0x0,0x0)/Pci(0x5,0x0)/Pci(0x0,0x0)/MAC(d05099de9975,0)/IPv4(0.0.0.00.0.0.0,0,0)..BO Boot0017 UEFI: PXE IP6 P0 Intel(R) I210 Gigabit Network Connection PciRoot(0x0)/Pci(0x1,0x2)/Pci(0x0,0x0)/Pci(0x5,0x0)/Pci(0x0,0x0)/MAC(d05099de9975,0)/IPv6([::]:<->[::]:,0,0)..BO Boot0020* UEFI OS HD(1,GPT,ba833d9e-2e05-45b3-bb6f-65a19b838594,0x100,0x1f500)/File(\EFI\BOOT\BOOTX64.EFI)..BO Boot0021* UEFI OS HD(1,GPT,aeb6561b-9571-4c19-b21b-9afac2c88e53,0x100,0x1f500)/File(\EFI\BOOT\BOOTX64.EFI)..BO
Note
|
For UEFI system and GPT based partition using grub2 bootloader. |
Tip
|
Always use 4096 block size for filesystem |
Tip
|
Format NVMe physical block size to 4096 whenever possible before using. |
Storage devices has 2 common physical sector size: 512 (512n
) and advanced format 4096 (4kn
) bytes. There even has storage devices supports 512 logical sector size in 4096 physical sector using 512e emulation mode. The 512e storage device report in smartctl
as:
Sector Sizes: 512 bytes logical, 4096 bytes physical
Storage | 512n | 4kn | 512e |
---|---|---|---|
SATA HDD |
obsolete |
[check] |
[check] |
SATA SSD |
[check] |
[question] |
[check] |
NVMe M.2 |
Format LBA |
[close] |
Warning
|
Avoid mixing storage devices of different physical sector size in a redundant array. Failure or data corruption may happen. |
For example, add mix physical volumes to a LVM group:
# query physical/logical block size and file system block size blockdev -v --getss --getpbsz --getbsz /dev/nvme0n1 get logical block (sector) size: 512 get physical block (sector) size: 512 get blocksize: 4096 # query physical/logical block size and file system block size blockdev -v --getss --getpbsz --getbsz /dev/nvme1n1 get logical block (sector) size: 4096 get physical block (sector) size: 4096 get blocksize: 4096 vgcreate vgroup0 /dev/nvme0n1p1 /dev/nvme1n1p1 Devices have inconsistent logical block sizes (512 and 4096). See lvm.conf allow_mixed_block_sizes.
Unlike other storage devices, most newer NVMe storage allow user to choose different physical sector size with nvme-cli
:
# Install nvme-cli utility dnf install -y nvme-cli # Set device name DEV=/dev/nvme0n1 # Query supported LBA nvme id-ns -H $DEV LBA Format 0 : Metadata Size: 0 bytes - Data Size: 512 bytes - Relative Performance: 0x2 Good (in use) LBA Format 1 : Metadata Size: 0 bytes - Data Size: 4096 bytes - Relative Performance: 0x1 Better # Format to 4096 nvme format --lbaf=1 $DEV # Re-query nvme id-ns -H $DEV LBA Format 0 : Metadata Size: 0 bytes - Data Size: 512 bytes - Relative Performance: 0x2 Good LBA Format 1 : Metadata Size: 0 bytes - Data Size: 4096 bytes - Relative Performance: 0x1 Better (in use) # Query Intel Optane P4801x DEV=/dev/nvme1n1 nvme format --lbaf=1 $DEV LBA Format 0 : Metadata Size: 0 bytes - Data Size: 512 bytes - Relative Performance: 0x2 Good (in use) LBA Format 1 : Metadata Size: 8 bytes - Data Size: 512 bytes - Relative Performance: 0x2 Good LBA Format 2 : Metadata Size: 16 bytes - Data Size: 512 bytes - Relative Performance: 0x2 Good LBA Format 3 : Metadata Size: 0 bytes - Data Size: 4096 bytes - Relative Performance: 0 Best LBA Format 4 : Metadata Size: 8 bytes - Data Size: 4096 bytes - Relative Performance: 0 Best LBA Format 5 : Metadata Size: 64 bytes - Data Size: 4096 bytes - Relative Performance: 0 Best LBA Format 6 : Metadata Size: 128 bytes - Data Size: 4096 bytes - Relative Performance: 0 Best
Warning
|
Intel NVMe storage device fail to work with nvme-cli |
# Download Intel MAS CLI tools curl -LO https://downloadmirror.intel.com/763590/Intel_MAS_CLI_Tool_Linux_2.2.zip # Unzip unzip Intel_MAS_CLI_Tool_Linux_2.2.zip # Install dnf install intelmas-2.2.18-0.i386.rpm # Show available NVMe devices intelmas show -all -intelssd # Format Intel NVMe to Format 3 # It takes some time to finish intelmas start -intelssd 0 -nvmeformat LBAFormat=3 WARNING! You have selected to format the drive! Proceed with the format? (Y|N): y Formatting...(This can take several minutes to complete) - Intel Optane(TM) SSD DC P4801X Series PHKM926000T6100D - Status : NVMeFormat successful.
Stage | File System | Mount Point | Files |
---|---|---|---|
1. Power On |
machine |
||
2. UEFI firmware |
machine, POST |
||
3. Grub2 boot loader |
FAT32 |
|
|
4. Linux Kernel |
[mdadm, lvm] + [ext4, xfs] |
|
|
5. init, mount root file system |
[mdadm, lvm] + [ext4, xfs] |
|
A grub2 bootable Linux installation consists of minimum 2 partitions:
-
EFI Partition
-
Root File System
sgdisk -p /dev/ Number Start (sector) End (sector) Size Code Name 1 2048 1050623 512.0 MiB EF00 2 1050624 196362239 93.1 GiB 8300
The underlying file systems of each partition:
lsblk -o NAME,FSTYPE,FSVER,MOUNTPOINT /dev/sda NAME FSTYPE FSVER MOUNTPOINT sda ├─sda1 vfat FAT32 /boot/efi ├─sda2 ext4 1.0 /
The /boot/efi
partition reported as FAT32
:
file -s /dev/sda1 /dev/sda1: DOS/MBR boot sector, code offset 0x58+2, OEM-ID "mkfs.fat", sectors/cluster 8, Media descriptor 0xf8, sectors/track 32, heads 64, hidden sectors 2048, sectors 1048576 (volumes > 32 MB), FAT (32 bit), sectors/FAT 1024, reserved 0x1, serial number 0xc49d266f, unlabeled
And the directory hierarchy for /boot
:
tree /boot -L 1 --dirsfirst /boot ├── efi ├── grub ├── config-5.10.0-21-amd64 ├── initrd.img-5.10.0-21-amd64 ├── System.map-5.10.0-21-amd64 └── vmlinuz-5.10.0-21-amd64
In above example, /dev/sda1
is
-
EFI partition formatted is FAT32
-
mount to
/boot/efi
This allow machine boot to UEFI mode look for EFI binaries in the EFI partition (Type code: EF00
) followed by vmlinuz
and initrd
in /boot
.
Note
|
/dev/sda2 mount to /boot (ext4)/dev/sda1 mount to /boot/efi (fat32)
|
Partition Code | Name |
---|---|
|
Linux swap |
|
Linux filesystem |
|
Linux LVM |
|
Solaris /usr & Mac ZFS |
|
Solaris Reserved 1EF00 EFI system partition |
|
Linux RAID |
A redundant installation requires minimum 2 storage devices to survive from unexpected disk failure.
LVM
is more flexible to manage than mdadm
. Besides, LVM
also support RAID type logical volume.
Important
|
EFI partition must be FAT32 file system. |
Partition | mount point | mdadm |
LVM |
Remark |
---|---|---|---|---|
EFI |
|
[check] |
[remove] |
UEFI can’t access LVM volume group. |
boot |
|
[check] |
[question] |
Can Grub2 access LVM volume? |
root |
|
[check] |
[check] [check] |
It seems impossible to use mdadm
to host the EFI partition as mdadm
partition type code is FD00
(Linux RAID).
According to 1 2, mdadm
support creating partition with --metadata 1.0
to put the RAID metadata at the end and expose the native file system signature to UEFI.
Grub2 load linux kernel and init ramdisk image in /boot
at later stage. It is unknown if grub2 can access LVM volume to load files from /boot
directory.
Root file system in the partition is mounted by init
process during vmlinuz kernel booting. The kernel binaries can built with mdadm
, LVM
or other modules easily.
This is a basic disk layout that can support the redundant usage:
Partition | RAID | File System | Mount Point | Remarks |
---|---|---|---|---|
1 |
mdadm |
FAT32 |
|
|
2 |
mdadm |
ext4 / xfs |
|
Some distro’s kickstart don’t allow |
3 |
LVM |
ext4 / xfs |
|
Thinly Provisioned |
Boot disk partition layout
sgdisk -p /dev/sda Number Start (sector) End (sector) Size Code Name 1 2048 1232895 601.0 MiB FD00 2 1232896 3332095 1.0 GiB FD00 3 3332096 234440703 110.2 GiB 8E00
Linux RAID EFI partition expose as FAT32 partition
file -s /dev/sda1 /dev/sdo2: DOS/MBR boot sector, code offset 0x58+2, OEM-ID "mkfs.fat", sectors/cluster 8, Media descriptor 0xf8, sectors/track 4, sectors 1230720 (volumes > 32 MB), FAT (32 bit), sectors/FAT 1200, reserved 0x1, serial number 0x68eba4a1, unlabeled
mdadm
raid layout
cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md126 : active raid1 sda2[2] sdb2[1] 615360 blocks super 1.0 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk md127 : active raid1 sda1[2] sdb1[1] 1047552 blocks super 1.2 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk unused devices: mdadm --detail --scan ARRAY /dev/md/boot metadata=1.2 name=localhost.localdomain:boot UUID=f0a6acbc:37355709:0f42d29d:9f2f7b8e ARRAY /dev/md/boot_efi metadata=1.0 name=localhost.localdomain:boot_efi UUID=67f3946b:2a724f50:9f582167:e7cdf1f2
root file system in LVM
pvs PV VG Fmt Attr PSize PFree /dev/nvme1n1 swap lvm2 a-- 931.51g <693.09g /dev/sda3 system lvm2 a-- <110.20g <107.87g /dev/sdb3 system lvm2 a-- <110.20g <107.87g vgs VG #PV #LV #SN Attr VSize VFree system 1 1 0 wz--n- <220.40g 215.73g lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert firebird_tmp swap -wi-ao---- 119.21g root system rwi-aor--- <2.33g 100.00
Making changes to LVM based root file system like convert logical volume from linear to mirror may cuase system failing to mount root file system during boot up.
It is now the time to rebuild initramfs:
After rename root file system logical volume:
# Optional: Update rd.lvm.lv in current grub config file to reflect new volume group name:
vi /etc/default/grub
Rebuild initramfs:
# Update grub.cfg to reflect new changes
sudo grub2-mkconfig -o "$(readlink -e /etc/grub2.cfg)"
# Make initramfs: Set version variable using current version string
VER=$(uname -r)
# Optional: Make initramfs: make a backup
sudo cp /boot/initramfs-$VER.img /boot/initramfs-$VER.img.backup
# Make initramfs: redhat/centos/fedora/rocklinux
sudo dracut -f /boot/initramfs-$VER.img $VER
Useful scripts and commands to repair mdadm device:
# Define a new disk
NEW=/dev/sde
# zap the new disk
sgdisk -Z $NEW
# Optional: Clone partition from existing raid device
# or use gdisk define new partition manually
sgdisk /dev/sdb -R $NEW
# Wipe all partition's file system signature
wipefs -a ${NEW}p*
# Randomize partition GUIDs to avoid conflict with existing raid device in last replicate operation
sgdisk -G $NEW
# Refresh partition tables
partprobe
# Optional: empty mdadm metadata of new partition
mdadm --zero-superblock $NEW[1-2]
# Optional: scan for available mdadm device and mount
sudo mdadm --assemble --scan
mount /boot /boot/efi
# Add new device to current raid device, modify the md #ID before execute
mdadm /dev/md127 --add ${NEW}1
mdadm /dev/md126 --add ${NEW}2
Useful scripts and commands to repair LVM volume:
# Define new LVM parttition
NEW=/dev/sde3
# Initialize physical volume
pvcreate ${NEW}
# Add new physical volume to system volume group
vgextend system ${NEW}
# Remove missing PVs in system volume group
vgreduce --removemissing --force system
# Repair a logical volume in volume group
lvconvert --repair system/root
# How to repair all LVs in VG at once?
tar -zcf /tmp/boot-efi.tgz -C /boot/efi . umount /boot/efi . <(blkid -o export /dev/md126) echo $UUID mkfs.vfat -F32 -S 4096 -i ${UUID/-} /dev/md126 mount /boot/efi tar -zxvf /tmp/boot-efi.tgz -C /boot/efi
tar -zcf /tmp/boot.tgz -C /boot .
After setup redundant boot, grub may prompt
error: ../../grub-core/disk/diskfilter.c:916:diskfilter writes are not supported
at boot up screen.
A workaround solution is
# Optional: Don't use grubenv, rename it
sudo mv /boot/grub2/grubenv /boot/grub2/grubenv.old
# Optional: Remove the grubenv
sudo rm /boot/grub2/grubenv
Here are some possible scenarios that cause Linux fail to boot:
-
Migrate storage devices
-
Rename root file system in LVM using
lvchange
or cockpit.
Most Linux booting issue can be solved by re-build grub.cfg
and initial RAM disk.
First, boot system into a compatible Linux live ISO/CD and drop to a shell console.
Next, mount both boot
and root
file systems:
# Optional: Activate boot deivces stored in mdadm raid devices
mdadm --assemble --scan
# Optional: Activate LVM volume for root file system
pvs
vgchange -ay vg-new
# mount root filesystem. Example: LVM volume
mount /dev/mapper/vg-new /mnt
# mount boot filesystem. Example: mdadm volume
mount /dev/md126 /mnt/boot
# mount EFI device. Example: mdadm volume
mount /dev/md127 /mnt/boot/efi
# Change to root filesystem
mount --bind /proc /mnt/proc
mount --bind /dev /mnt/dev
mount --bind /sys /mnt/sys
chroot /mnt
The system has mount to existing file systems. We can start fixing the issue now.
# Update rd.lvm.lv in current grub config file to reflect new volume group name:
vi /etc/default/grub
# optional: Disable probing other OS
cat << EOF | tee -a /etc/default/grub
GRUB_DISABLE_OS_PROBER=true
EOF
# Update grub.cfg to reflect new changes
grub2-mkconfig -o "$(readlink -e /etc/grub2.cfg)"
# Make initramfs: Switch to /boot
cd /boot
# Make initramfs: Set version variable using current version string
VER=$(uname -r)
# Make initramfs: or set a static version string if version of live system is different to actual system version
VER=5.14.0-162.18.1.el9_1.x86_64
# Make initramfs: make a backup
cp initramfs-$VER.img initramfs-$VER.img.backup
# Make initramfs: redhat based to build initramfs
dracut -f /boot/initramfs-$VER.img $VER
The fix is complete now. Finally, tidy the system and reboot:
# Exit chroot
exit
# umount boot and root
umount /mnt/boot/efi /mnt/boot /mnt
# reboot
reboot
Note
|
UEFI firmware should detect all available EFI Partitions in storage devices automatically and offer for booting. In general, it is not necessary to update these boot entries. |
# Show current boot entries
efibootmgr -v
# Example: Remove unuse boot entries
BOOT=0006; efibootmgr -B -b $BOOT
# Add missing boot entries for MDADM EFI partition
# Get OS variables
. /etc/os-release
# Add boot entry for 1st device
DEV=/dev/sda
efibootmgr -c -d $DEV -p 2 -L "$NAME" -l "\EFI\\$ID\shimx64.efi"
# Add boot entry for 2nd device
DEV=/dev/sdb
efibootmgr -c -d $DEV -p 2 -L "$NAME" -l "\EFI\\$ID\shimx64.efi"