Skip to content

Instantly share code, notes, and snippets.

@MaxXor
Last active October 12, 2024 02:45
Show Gist options
  • Save MaxXor/ba1665f47d56c24018a943bb114640d7 to your computer and use it in GitHub Desktop.
Save MaxXor/ba1665f47d56c24018a943bb114640d7 to your computer and use it in GitHub Desktop.
Btrfs guide to set up an LUKS-encrypted btrfs raid volume with included maintenance & recovery guide

Encrypted Btrfs storage setup and maintenance guide

Initial setup with LUKS/dm-crypt

This exemplary initial setup uses two devices /dev/sdb and /dev/sdc but can be applied to any amount of devices by following the steps with additional devices.

Create keyfile:

dd bs=64 count=1 if=/dev/urandom of=/etc/cryptkey iflag=fullblock
chmod 600 /etc/cryptkey

Encrypt devices:

cryptsetup -v -c aes-xts-plain64 -h sha512 -s 512 luksFormat /dev/sdb /etc/cryptkey
cryptsetup -v -c aes-xts-plain64 -h sha512 -s 512 luksFormat /dev/sdc /etc/cryptkey

Backup LUKS header:

cryptsetup luksHeaderBackup --header-backup-file ~/sdb.header.bak /dev/sdb
cryptsetup luksHeaderBackup --header-backup-file ~/sdc.header.bak /dev/sdc

Automatically unlock LUKS devices on boot by editing /etc/crypttab:

data1 UUID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx /etc/cryptkey luks,noearly #,discard (for SSDs)
data2 UUID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx /etc/cryptkey luks,noearly #,discard (for SSDs)
# Use 'blkid /dev/sdb' to get the UUID

Unlock encrypted devices now to create the filesystem in next step:

cryptsetup open --key-file=/etc/cryptkey --type luks /dev/sdb data1
cryptsetup open --key-file=/etc/cryptkey --type luks /dev/sdc data2

Create filesystem:

mkfs.btrfs -m raid1 -d raid1 /dev/mapper/data1 /dev/mapper/data2

Mount filesystem:

mount -t btrfs -o defaults,noatime,compress=zstd /dev/mapper/data1 /mnt/data

Automatically mount btrfs filesystem on boot by editing /etc/fstab:

UUID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx /mnt/data btrfs defaults,noatime,compress=zstd 0 2
# Add option 'autodefrag' to allow automatic defragmentation: useful for files with lot of random writes like databases or virtual machine images
# Use 'blkid /dev/mapper/data1' to get the UUID of the Btrfs partition, it is common for all raid devices

Maintenance

In a btrfs raid setup it is necessary to frequently run a btrfs scrub to check for corrupted blocks/flipped bits and repair them using a healthy copy from one of the mirror disks.

In the example below a systemd-timer is used to run an automatic btrfs scrub job each month.

/etc/systemd/system/btrfs-scrub.timer:

[Unit]
Description=Monthly scrub btrfs filesystem, verify block checksums
Documentation=man:btrfs-scrub

[Timer]
# first saturday each month
OnCalendar=Sat *-*-1..7 3:00:00
RandomizedDelaySec=10min

[Install]
WantedBy=timers.target

/etc/systemd/system/btrfs-scrub.service:

[Unit]
Description=Scrub btrfs filesystem, verify block checksums
Documentation=man:btrfs-scrub

[Service]
Type=simple
ExecStart=/bin/btrfs scrub start -Bd /mnt/data
KillSignal=SIGINT
IOSchedulingClass=idle
CPUSchedulingPolicy=idle

Recovery from device failure

Example with one failed device:

  • /dev/mapper/data1 working device
  • /dev/mapper/data2 failed device
  • /dev/mapper/data3 new device
  • /mnt/data mountpoint

In case of failing/failed device, mount in degraded mode with one of the working devices:

mount -t btrfs -o defaults,noatime,compress=zstd,degraded /dev/mapper/data1 /mnt/data

Find the the device ID of the missing disk by executing btrfs device usage /mnt/data:

# btrfs device usage /mnt/data

/dev/mapper/data1, ID: 1
   Device size:             7.28TiB
   Device slack:              0.00B
   Data,RAID1:              5.46TiB
   Metadata,RAID1:          7.00GiB
   System,RAID1:           32.00MiB
   Unallocated:             1.81TiB

missing, ID: 2
   Device size:             7.28TiB
   Device slack:              0.00B
   Data,RAID1:              5.46TiB
   Metadata,RAID1:          7.00GiB
   System,RAID1:           32.00MiB
   Unallocated:             1.81TiB

NOTE: Encrypt the new device before using it in the btrfs raid by following the steps above.

Start the replace operation in background by adding the new device to the btrfs raid using the device ID of the missing disk:

btrfs replace start 2 /dev/mapper/data3 /mnt/data

(Optional) Check the replace progress:

btrfs replace status /mnt/data

Once the replace operation has finished, the fstab entry is left unmodified:

UUID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx /mnt/data btrfs defaults,noatime,compress=zstd 0 2
# Use 'blkid /dev/mapper/data1' to get the UUID of the Btrfs partition, it is common for all raid devices
@learnitall
Copy link

Super helpful, thank you!

@george-andrei
Copy link

Amazing, thanks a lot!

@iio7
Copy link

iio7 commented Oct 23, 2022

Where does GRUB fit into this? I see no grub-install command.

@MaxXor
Copy link
Author

MaxXor commented Oct 24, 2022

Where does GRUB fit into this? I see no grub-install command.

Hello @iio7, this is not guide which uses Btrfs on the system partition, so there's no need for a bootloader like grub. Btrfs here is used as data partition on separate disks, e.g. a NAS. Furthermore I wouldn't recommend using Btrfs configured as RAID1 as system partition. When one disk fails, your system won't boot anymore, unlike ZFS which automatically boots in degraded mode, but Btrfs unfortunately does not.

@iio7
Copy link

iio7 commented Oct 30, 2022

Hello @iio7, this is not guide which uses Btrfs on the system partition, so there's no need for a bootloader like grub. Btrfs here is used as data partition on separate disks, e.g. a NAS. Furthermore I wouldn't recommend using Btrfs configured as RAID1 as system partition. When one disk fails, your system won't boot anymore, unlike ZFS which automatically boots in degraded mode, but Btrfs unfortunately does not.

Thank you very much for your reply, I didn't know about Btrfs not booting in a degraded mode!

@MaxXor
Copy link
Author

MaxXor commented Oct 30, 2022

Thank you very much for your reply, I didn't know about Btrfs not booting in a degraded mode!

Well, it's not automatic or default for any distro I know of. So if you go into the recovery process and replace one of the Raid boot disks, the system won't boot anymore. Here is the description of the bug and a possible workaround: https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1229456

@iio7
Copy link

iio7 commented Oct 30, 2022

Well, it's not automatic or default for any distro I know of. So if you go into the recovery process and replace one of the Raid boot disks, the system won't boot anymore. Here is the description of the bug and a possible workaround: https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1229456

I found that kernel argument after looking into the issue. Thanks.

@NekoiNemo
Copy link

NekoiNemo commented Dec 3, 2022

https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices#Replacing_failed_devices

Seems like there's an easier way to replace a missing/failed device with btrfs device replace, which, according to anecdotal evidence on forum/reddit threads, might be a bit more expedient than the add+delete approach.

And another moment: i would still recommend mounting by UUID in the fstab rather than by the mapper name, for the sake of stability (say, one of those drives fails, you replace it and give the replacement next integer increment in the name... but the failed drive was the one whose name you have used in the fstab). You can get the btrfs UUID (it's shared across all of the partitions in the array) by using the blkid after any (you don't even need to unlock all of them) of the crypt devices is unlocked:

# blkid | sort
/dev/mapper/crypt_data_001: UUID="a9df909a-5fe8-47ae-be09-7c443d96e558" UUID_SUB="d7da35bd-1c61-4748-8696-70988077172a" BLOCK_SIZE="4096" TYPE="btrfs"
/dev/mapper/crypt_data_002: UUID="a9df909a-5fe8-47ae-be09-7c443d96e558" UUID_SUB="9a93e5f7-464b-4142-8656-818270925eeb" BLOCK_SIZE="4096" TYPE="btrfs"
/dev/mapper/crypt_data_003: UUID="a9df909a-5fe8-47ae-be09-7c443d96e558" UUID_SUB="b9898e5c-7999-4c22-951c-5edf4ffaa5cb" BLOCK_SIZE="4096" TYPE="btrfs"
/dev/sda: UUID="f98349d1-f8a7-4383-bc23-2e6a80315c6b" TYPE="crypto_LUKS"
/dev/sdb: UUID="dea5e4a7-20aa-456f-807a-81c62a0e5c0c" TYPE="crypto_LUKS"
/dev/sdc: UUID="3c60817c-3285-4cd6-8bee-3de7799221cd" TYPE="crypto_LUKS"

so your fstab should look like

UUID=a9df909a-5fe8-47ae-be09-7c443d96e558 /mnt/data btrfs defaults,noatime,compress=zstd 0 2

@MaxXor
Copy link
Author

MaxXor commented Dec 4, 2022

@NekoiNemo Thank you. I updated the section Recovery from device failure with the btrfs replace command and edited the fstab examples according to your suggestions. It definitely makes sense to use the common btrfs partition id.

@FrankelJb
Copy link

FrankelJb commented Dec 30, 2022

Resolved the issue. The /etc/crypttab file had the incorrect SELinux policy

Thanks @MaxXor this is an amazing guide.

I'm trying to move my btrfs disks to a new OS, starting fresh after a couple of years. I've retraced these steps and running cryptsetup open --key-file=/etc/cryptkey --type luks /dev/sdb data1 works and I can then run mount -a and everything is in /mnt/data. However, when I try and reboot, it seems that the volumes don't get decrypted and then fstab fails. Any idea how I can test/debug crypttab?

@MaxXor
Copy link
Author

MaxXor commented Dec 30, 2022

@FrankelJb Thanks for your feedback! :)

Can you show your /etc/crypttab content?

@phalaaxx
Copy link

Thanks for the guide. I just don't understand the reason to have an encrypted storage when your encryption key is located on the same system, everyone gaining physical or remote access to it will also have access to the encryption key and to the data stored in your btrfs filesystem. So what good is it configured like this?

@NekoiNemo
Copy link

NekoiNemo commented Jul 12, 2023

It depends on what you want to protect from. If someone has remote ROOT access to your machine to read the encryption key (which is hopefully stored as 400/600 root-owned file)... You have other problems to worry about. Meanwhile this method allows you to enter the pass by hand once on boot to unlock the system disk, and then use the encryption keys stored on it to unlock all of the rest of your disks

@george-andrei
Copy link

george-andrei commented Jul 12, 2023

Thanks for the guide. I just don't understand the reason to have an encrypted storage when your encryption key is located on the same system, everyone gaining physical or remote access to it will also have access to the encryption key and to the data stored in your btrfs filesystem. So what good is it configured like this?

The assumption is that your OS is already encrypted in a different partition. The encryption of the OS partition is not explained in this guide and probably out of scope.

If you still want to encrypt your data and are unable for any reason to encrypt your OS partition, maybe you can set pass-phrases instead of cryptkey. By doing this, you will have to manually enter the pass-phrase every time you reboot the system. (Please take this with a grain of salt, I am no expert, this might not be as secure)

@MaxXor
Copy link
Author

MaxXor commented Jul 12, 2023

@phalaaxx I assume that you are already using an encrypted system partition. That part is covered in all installers of common distributions such as Ubutnu, Debian, etc. so I won't go into detail here. It's more about securely adding additional storage, which when powered off can't be accessed by any means. Remote root access to your machine is indeed a worst-case scenario and this guide is not meant to protect your data from such attacks.

@phalaaxx
Copy link

@phalaaxx I assume that you are already using an encrypted system partition. That part is covered in all installers of common distributions such as Ubutnu, Debian, etc. so I won't go into detail here. It's more about securely adding additional storage, which when powered off can't be accessed by any means. Remote root access to your machine is indeed a worst-case scenario and this guide is not meant to protect your data from such attacks.

That makes sense, thank you.
(Also thanks to @george-andrei and @NekoiNemo)

@Luckythakurbit
Copy link

Hi

@lemushyman
Copy link

I have used the guide to set up two drives in BTRFS / RAID 1, and need to move them to a fresh install of Debian whilst preserving all of the data stored on them. I've gotten as far as tracing the steps to cryptsetup open --key-file=/etc/cryptkey --type luks /dev/sdb data1 and cryptsetup open --key-file=/etc/cryptkey --type luks /dev/sdc data2 so that the drives decrypt, but I'm concerned the following mkfs.btrfs -m raid1 -d raid1 command is going to wipe data off the drives and make everything I've stored so far unreadable. Does anyone know how I should proceed so that the drives can simply mount as they did on my previous installation, without reformatting them? I've backed up the LUKS headers for each drive and /etc/cryptkey as well.

@NekoiNemo
Copy link

If you want to retain the data - why not just plug the drives in and do the "automatically unlock drives" and "automatically mount drives" steps? Since you will be retaining your existing encryption and btrfs filesystem - there's no need to alter it.

Just do the cryptsetup open and mount to test that everything works on the new system, then do only the "automatically on boot" steps

@lemushyman
Copy link

I was having issues because I tried mounting only one at a time rather than both as intended. Problem solved.

@TheClockTwister
Copy link

TheClockTwister commented Feb 3, 2024

Great cheat sheet! Thanks!

Does anyone have information on how to deal with defragmentation? The author of btrfs has a maintenance script collection which also includes the btrfs defrag script.

Now, because cryptsetup will normally use AES-XTS (which is a block-cipher mode), I would assume that the positions of reads/writes inside the encrypted volume match reads/writes on the physical disk. Therefore, defragmenting the data blocks inside the encrypted volume will also defragment the encrypted blocks on the physical disk, thus improving performance.

Can anyone please confirm or deny this with some evidence or expert knowledge...?

@schuerg
Copy link

schuerg commented Apr 11, 2024

@MaxXor Thank you for this documentation!

I am also considering a Btrfs raid10 setup (consisting of four HDDs) together with LUKS encryption.

Has anyone experience regarding performance? Read/write throughput, latency and IOPS?

Because with raid10 (mirroring and striping) on top of seperate LUKS encrypted block devices, the CPU has to encrypt/decrypt multiple times.


Interesting would be benchmarking a Btrfs raid10 setup with and without LUKS encryption: https://openbenchmarking.org/suite/pts/disk

@maxnatt
Copy link

maxnatt commented Jun 10, 2024

I had some WARNING: Multiple block group profiles detected, see 'man btrfs(5)' warnings after device replacement -- rebalance is required, see https://forums.gentoo.org/viewtopic-t-1140141-start-0-postdays-0-postorder-asc-highlight-btrfs%2Breplace.html

@rkorn86
Copy link

rkorn86 commented Jul 19, 2024

A faulty disk may be recognized by LUKS

# dmsetup table
data3: 0 23437737984 error

But it cannot be removed from device mapper, because it is still in use by BTRFS. So btrfs filesystem show will never show, that a device is missing:

 # btrfs filesystem show /data                                                                                                                                                             
Label: 'data'  uuid: b912b3b0-3436-4960-adf0-cb82e8ff2448                                                                                                                                                         
        Total devices 4 FS bytes used 4.87TiB                                                                                                                                                                     
        devid    1 size 10.91TiB used 2.90TiB path /dev/mapper/data1                                                                                                                                              
        devid    2 size 10.91TiB used 2.91TiB path /dev/mapper/data2                                                                                                                                              
        devid    3 size 10.91TiB used 2.90TiB path /dev/mapper/data3                                                                                                                                              
        devid    4 size 10.91TiB used 2.91TiB path /dev/mapper/data4

So it is a deadlock.

LUKS cannot close the faulty DM:

# cryptsetup close data3 
device-mapper: remove ioctl on data3  failed: Device or resource busy
# dmsetup remove -f   data3
device-mapper: remove ioctl on data3  failed: Device or resource busy
Command failed.

BTRFS is not notified about error in underlying device from DM, which is part of a RAID and still shows device as active, but generating Kernel Logs and btrfs io errors.

How do you deal with a situation where a hard disk can go offline unexpectedly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment