Multi-device bcachefs mounts in fstab cause system hang on boot.
Additionally systemd integration for mounting degraded needs work to
"just work" for users, using the same mount interface that they expect from
mount
.
During boot, the system should attempt mounting the filesystem degraded, and once a mount succeeds it should add subsequent disks that are part of a filesystem to the already mounted filesystem.
Two options, preference to option #1 due to better UI and no requirement for changes to upstream systemd.
Override sysetemd's auto-generated .mount file by creating a masked .mount unit to prevent the auto-generated one from blocking boot.
- no upstream systemd code required
- transparent to users
- kinda hacky to leave behind a useless unit file
- requires a systemd generator
A new generator[1] fstab-generator-bcachefs
will parse from /etc/fstab via
getmntent() data and create a unit[2] per filesystem and a unit per block
device. The filesystem unit must not have a ordering dependency on the device
units, but should start when any of the device units starts. Both the
filesystem and device mounts are triggered by the udev rule via
SYSTEMD_WANTS
[4].
Udev rule will check if the filesystem is mounted. If not yet mounted, then the
rule will attempt to mount the filesystem via the filesystem mount unit via
SYSTEMD_WANTS
. If the filesystem is already mounted, then the rule will request
adding the device via the ioctl.
example udev rule:
SUBSYSTEM!="block", GOTO="bcachefs_end"
ACTION=="remove", GOTO="bcachefs_end"
ENV{ID_FS_TYPE}!="bcachefs", GOTO="bcachefs_end"
ENV{SYSTEMD_READY}=="0", GOTO="bcachefs_end"
# 1) get the mount point of the filesystem from blkid
# generator can populate udev db with key/value pairs [5]
# generator can alternatively populate a file with environment variables
# generator will populate BLKID -> mount point lookup values
# 2) from a udev rule, is the filesystem that it is a member of currently mounted?
# check if fs is mounted using /proc/mounts?
IMPORT{file}="/run/bcachefs/mount-map" # sets BCACHEFS_MOUNT_PATH
IMPORT{program}="/bin/sh -c 'echo BCACHEFS_MOUNTED=$(grep $BCACHEFS_MOUNT_PATH)'"
# fstab-generator-bcachefs created the service file
# `mount-bcachefs-$env{BCACHEFS_MOUNT_PATH}.service` during early boot from the
# contents of /etc/fstab.
#
# Mounting directly in systemd services isn't allowed. Start a systemd service
# which will attempt to mount the filesystem. When split-brain detection is
# complete, mounting with -o degraded be added to the unit file by default.
ENV{BCACHEFS_MOUNTED}==0, ENV{SYSTEMD_WANTS}+="mount-bcachefs-$env{BCACHEFS_MOUNT_PATH}.service"
# fstab-generator-bcachefs created the service file
# `bcachefs_device_add_$name.service` during early boot from the contents of
# /etc/fstab.
#
# The mountpoint already exists, so add the newly online device to the
# filesystem via `bcachefs device add`. The following line will only occur when
# mounting degraded previously succeeded.
#
# This shouldn't block (just an ioctl) so can probably don't need this to be a service
ENV{BCACHEFS_MOUNTED}==1, ENV{SYSTEMD_WANTS}+="bcachefs_device_add_$name.service"
LABEL="bcachefs_end"
[1] in rust, use getmntent()
/setmntent()
modeled after systemd's fstab-generator
[2] a .mount would be preferred for implicit systemd dependencies and expected behavior, but a .mount must have a dependency on a .device of the What= in their definition. This means that one device must be selected which will unconditionally block boot - defeating the purpose of redundancy. An alternative would be a .service that includes the default dependencies described in systemd.mount that blocks other units by the expected dependency rules.
[3] slices might be convenient for grouping disks
[4] man:systemd.device(5)
[5] https://www.freedesktop.org/software/systemd/man/latest/udevadm.html#-p2