whitslack/early_userspace_without_initramfs.md

## early_userspace_without_initramfs.md

      
    Raw
  

              early_userspace_without_initramfs.md
            
          
    Early Userspace without Initramfs

If you've built your own kernel with all necessary storage-controller and file-system drivers built in, then you may have no need of an early userspace environment. However, if you want to do anything non-trivial with your root file system (LVM, LUKS, etc.), then you need an early userspace to set up and mount it. The traditional mechanism for this is initramfs, but building and maintaining an initramfs image is awkward and tiresome. Initramfs is a sledgehammer when, nine times out of ten, all you need is a screwdriver. This guide details a method of booting into an early userspace environment located in an ordinary file system on a physical disk partition, where an init script in this environment in turn sets up and mounts the real root file system and pivots into it.
Setting Up the Basic Environment

In order to employ this method of booting your system, you will need a traditional (non-LVM) disk partition containing a file system that your kernel can mount without needing to load any modules. This guide will henceforth refer to this partition as the boot device.
Important: This guide assumes that your boot device is /dev/sda1 and your root device is /dev/sda3. Be sure to make all appropriate substitutions in the steps throughout this guide, lest you obliterate something you shouldn't.


Format and mount the boot device.
# mkfs.ext4 -L Boot -O ^has_journal /dev/sda1

# mkdir -p /boot

# mount -o noatime /dev/sda1 /boot


Create the basic file-system hierarchy and populate /etc/fstab.
# mkdir -p /boot/{dev,etc,mnt,proc,run,sys,tmp,var}

# ln -s /run /tmp /boot/var/

# cat > /boot/etc/fstab <<EOF
/dev/pts	/dev/pts	devpts	noexec,nosuid	0 0
/proc	/proc	proc	nodev,noexec,nosuid	0 0
/run	/run	tmpfs	nodev,nosuid	0 0
/sys	/sys	sysfs	nodev,noexec,nosuid	0 0
/tmp	/tmp	tmpfs	nodev,nosuid	0 0
EOF


Emerge a very minimal system.
Important: Change amd64 below to your actual CPU type, if necessary.
# mkdir -p /boot/etc/portage/profile

# ln -s /usr/portage/profiles/prefix/linux-standalone/amd64 /boot/etc/portage/make.profile

# emerge --info | grep '^ACCEPT_KEYWORDS=' >> /boot/etc/portage/profile/make.defaults

# echo 'FEATURES="nodoc noinfo noman"' >> /boot/etc/portage/profile/make.defaults

# cat > /boot/etc/portage/profile/packages <<EOF
-*app-arch/bzip2
-*app-arch/gzip
-*app-arch/tar
-*app-arch/xz-utils
-*app-shells/bash:0
-*net-misc/rsync
-*net-misc/wget
-*sys-apps/coreutils
-*sys-apps/diffutils
-*sys-apps/file
-*>=sys-apps/findutils-4.4
-*sys-apps/gawk
-*sys-apps/grep
-*sys-apps/less
-*sys-apps/man-pages
-*sys-apps/net-tools
-*sys-apps/sed
-*sys-apps/which
-*sys-devel/binutils
-*sys-devel/gcc
-*sys-devel/gnuconfig
-*sys-devel/make
-*>=sys-devel/patch-2.6.1
-*sys-process/procps
-*sys-process/psmisc
-*virtual/editor
-*virtual/man
-*virtual/os-headers
-*virtual/package-manager
-*virtual/pager
-*virtual/service-manager
-*virtual/ssh

*sys-libs/glibc
EOF

# cat >> /boot/etc/portage/profile/package.use << EOF
sys-apps/busybox -static
sys-apps/util-linux -cramfs
EOF

# emerge --root=/boot --config-root=/boot @system


Create the init scripts that will boot your system. We begin with a basic setup here and will add goodies in later sections of this guide.
Important: Change /dev/sda3 below to your actual root device.
# cat > /boot/init.sh <<EOF
#!/bin/busybox sh
set -e

for each in /init.d/* ; do
	. "${each}"
done
EOF

# chmod 0700 /boot/init.sh

# mkdir /boot/init.d

# cat > /boot/init.d/00-mounts <<EOF
mkdir /dev/pts /dev/shm
mount /dev/pts
mount /proc
mount /run
mount /sys
mount /tmp
EOF

# cat > /boot/init.d/40-printk << EOF
echo 1 > /proc/sys/kernel/printk
EOF

# cat > /boot/init.d/50-mountroot <<EOF
mount --ro /dev/sda3 /mnt
EOF

# cat > /boot/init.d/69-printk << EOF
echo 7 > /proc/sys/kernel/printk
EOF

# cat > /boot/init.d/99-pivotroot <<EOF
umount /tmp /sys /run /proc /dev/pts
mount --move /dev /mnt/dev
cd /mnt
pivot_root . boot
exec chroot . /sbin/init < dev/console > dev/console 2>&1
EOF


Install your kernel.
Important: This guide assumes that you have set CONFIG_DEVTMPFS_MOUNT=y in your kernel configuration. If you have not, you must set it and recompile your kernel, or you will have problems.
# make -C /usr/src/linux install

# ln -sr /boot/vmlinuz{-*,}


Install a bootloader. Extlinux is simple and works well.
# emerge -n sys-boot/syslinux

# mkdir /boot/extlinux

# extlinux --install /boot/extlinux

# cat /usr/share/syslinux/mbr.bin > /dev/sda

# cat > /boot/extlinux/extlinux.conf <<EOF
DEFAULT linux

LABEL linux
	KERNEL /vmlinuz
	APPEND root=/dev/sda1 rootwait init=/init.sh
EOF


At this point, you may wish to reboot your system to your new boot device, to test that your new early userspace environment is working. This may require marking the boot partition as "active" (using fdisk or similar) and/or reconfiguring your BIOS settings to change your default boot device. These steps are outside the scope of this guide.
If all goes well, you should not observe any difference versus your traditional boot. However, you now have an environment capable of running commands before the root file system is mounted, meaning you can do fun things like full-disk encryption.
Interactive Rescue Environment

It may not be immediately obvious, but you now have almost everything you need for an interactive rescue environment, which you can optionally boot into to do emergency maintenance tasks such as running fsck on your root file system. You just need to assemble a few additional pieces.


Symlink /sbin/init to BusyBox so there's a real init for the kernel to start.
# ln -s ../bin/busybox /boot/sbin/init


Create an inittab.
# cat > /boot/etc/inittab <<EOF
::sysinit:/bin/busybox mkdir /dev/pts /dev/shm
::sysinit:/bin/busybox mount -a

::respawn:-/bin/busybox sh

::shutdown:/bin/busybox killall5
::shutdown:/bin/busybox umount -a -r
EOF


Add an option to the bootloader configuration for booting into the rescue environment.
# cat >> /boot/extlinux/extlinux.conf <<EOF
LABEL rescue
	KERNEL /boot/vmlinuz
	APPEND root=/dev/sda1 rootwait
EOF

Notice that the only difference between this new rescue label and the default linux label is the lack of init=/init.sh in the kernel command line. The kernel executes /sbin/init by default.


You may wish to install additional utilities for diagnosing problems with your root file system.
Note: The packages shown here are just examples; you could install packages specific to the file systems you use.
# emerge --root=/boot --config-root=/boot sys-fs/e2fsprogs sys-fs/xfsprogs


To enter into your new rescue environment when booting, hold down the Shift or Alt key (or engage Caps Lock or Scroll Lock) before the kernel loads, and a boot: prompt will appear. Type rescue and press Enter.
Networking Support with DHCP

You can add networking support to your early userspace environment fairly easily. This is useful if you need to mount network shares or you wish to allow remote control of the environment over SSH.
Important: Change eth0 in the scripts below to your actual network device name. Note that there is no udev in the early userspace environment, so the network device name will be whatever the kernel assigns, not the persistent name that udev assigns later in the boot process.


Symlink /etc/resolv.conf to /run/resolv.conf, as /etc may be read-only during boot.
# ln -s /run/resolv.conf /boot/etc/


Add an init script to bring up your network device and run BusyBox's DHCP client.
# cat > /boot/init.d/10-network <<EOF
ip link set up dev eth0

udhcpc -f -i eth0 &
pid_udhcpc=$!
EOF

Note: If you need to send a host name and/or client ID, perhaps to cause your DHCP server to return a fixed IP address mapping, you can add to the udhcpc command line (before the ampersand) -x hostname:<your-hostname> and/or -x 0x3d:<your-client-ID> (with no colons in the client ID, just hex digits, and no angle brackets).


Add an init script to stop the DHCP client and deconfigure the network interface, so that your later boot scripts can start with a clean slate.
# cat > /boot/init.d/89-network <<EOF
kill "${pid_udhcpc}"
wait "${pid_udhcpc}" || :

ip -4 addr flush dev eth0
ip link set down dev eth0
EOF


Remote Control over SSH

It is possible to run an SSH server in the early userspace environment. This is useful if you need to enter a passphrase to unlock an encrypted storage device but may not always have physical access to the console.


Emerge the Dropbear SSH server.
# echo 'net-misc/dropbear -shadow -zlib' >> /boot/etc/portage/package.use

# emerge --root=/boot --config-root=/boot net-misc/dropbear


Install your host keys, converting them to Dropbear's format.
# mkdir /boot/etc/dropbear

# /boot/usr/bin/dropbearconvert openssh dropbear /etc/ssh/ssh_host_dsa_key /boot/etc/dropbear/dropbear_dss_host_key

# /boot/usr/bin/dropbearconvert openssh dropbear /etc/ssh/ssh_host_rsa_key /boot/etc/dropbear/dropbear_rsa_host_key

# /boot/usr/bin/dropbearconvert openssh dropbear /etc/ssh/ssh_host_ecdsa_key /boot/etc/dropbear/dropbear_ecdsa_host_key


Add init scripts to start and stop the Dropbear server.
# cat > /boot/init.d/11-dropbear <<EOF
dropbear -F -P '' -I 60 &
pid_dropbear=$!
EOF

# cat > /boot/init.d/88-dropbear <<EOF
kill "${pid_dropbear}"
wait "${pid_dropbear}" || :
EOF


Copy your authorized_keys file.
# mkdir -p /boot/root/.ssh

# cp -a ~/.ssh/authorized_keys /boot/root/.ssh/


Install the default user and group manifests.
# cp -a /usr/share/baselayout/{passwd,group} /boot/etc/


Change the root user's shell to /bin/sh, since Bash is not installed.
# ln -s busybox /boot/bin/sh

# chsh --root /boot --shell /bin/sh root


Add an init script to pause the boot process at a prompt, to allow for remote access.
# cat > /boot/init.d/49-pause <<EOF
read -r -p 'Press Enter to continue boot...'
EOF


Full-Disk Encryption with LUKS

The impetus for all of this, of course, is to allow for complex root file system mounts, which cannot be achieved simply with kernel command-line arguments. The following section of this guide details how to convert an existing root partition in place (i.e., preserving the existing file system and its contents) to an encrypted partition and how to set up the early userspace environment to prompt for the passphrase to mount the root file system contained in this partition.


Before you begin, verify that your disk has free space available to shift the start of your root partition by at least 1032 sectors toward the beginning of the disk.
# sfdisk -lq /dev/sda
Device     Boot    Start        End    Sectors  Size Id Type
/dev/sda1  *        2048    1048575    1046528  511M 83 Linux
/dev/sda2        1048576   16777215   15728640  7.5G 82 Linux swap / Solaris
/dev/sda3       16777216 2147483647 2130706432 1016G 83 Linux

Shown above is an example of a typical partition layout, with a small boot partition first, followed by a swap partition, followed by the large root partition. In this case, the swap partition can be deleted and created anew with a slightly smaller size, to make room for expanding the root partition into the vacated space.
Important: If your partition layout lacks sufficient free space to relocate your root partition by at least 1032 sectors closer to the beginning of your disk, then do not continue with this guide!


Emerge cryptsetup.
# cat >> /boot/etc/portage/package.use <<EOF
sys-fs/cryptsetup -gcrypt kernel
sys-fs/lvm2 -thin device-mapper-only
EOF

# echo 'sys-apps/baselayout-2.2' >> /boot/etc/portage/profile/package.provided

# emerge --root=/boot --config-root=/boot sys-fs/cryptsetup


Determine the number of sectors needed for the LUKS header.
# dd if=/dev/null of=/tmp/tmp.img bs=1M seek=64

# LOOPDEV=$(losetup -f --show /tmp/tmp.img)

# /boot/sbin/cryptsetup luksFormat -q --align-payload 1 "${LOOPDEV}"
Enter passphrase: [press Enter here]

# /boot/sbin/cryptsetup luksDump "${LOOPDEV}" | grep '^Payload offset:'
Payload offset: 2056

# losetup -d "${LOOPDEV}"

# rm /tmp/tmp.img

Note: If you do not have enough space to grow your root partition by the number of sectors reported as the "Payload offset", then repeat this step, but add --cipher aes-cbc-essiv:sha256 --key-size 128 to the luksFormat command. These parameters should result in the smallest possible LUKS header. If you still do not have enough space, then you must not continue with this guide!


Before proceeding, make a full backup of your file system to an external disk. Even if you perform all of the following steps perfectly, a power glitch or a kernel panic during the encryption process will trash your file system irreparably. You have been warned!


Rewrite the 50-mountroot init script.
# cat > /boot/init.d/50-mountroot <<EOF
until cryptsetup luksOpen /dev/sda3 root ; do : ; done
mount --ro /dev/mapper/root /mnt
EOF


If you created 49-pause earlier, you should delete it now, as it is no longer useful.
# rm -f /boot/init.d/49-pause


Reboot into your shiny new interactive rescue environment. You cannot perform the remaining steps while your root file system is mounted.


Use sfdisk to extend your root partition toward the beginning of the disk by exactly the number of sectors reported earlier by luksDump as the "Payload offset". Also, change its type to e8, which is the standard partition type for a LUKS partition.
Important: The numbers shown below are examples only. You will need to use the actual numbers reported by sfdisk for your disk, decreasing the size of the swap partition, decreasing the start of the root partition, and increasing the size of the root partition, all by the exact number of sectors reported earlier as the "Payload offset".
If you have ANY DOUBTS about what you are doing, STOP NOW!
# sfdisk /dev/sda

Welcome to sfdisk (util-linux 2.27.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Checking that no-one is using this disk right now ... OK

Disk /dev/sda: 1 TiB, 1099511627776 bytes, 2147483648 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xf04ad805

Old situation:

Device     Boot    Start        End    Sectors  Size Id Type
/dev/sda1  *        2048    1048575    1046528  511M 83 Linux
/dev/sda2        1048576   16777215   15728640  7.5G 82 Linux swap / Solaris
/dev/sda3       16777216 2147483647 2130706432 1016G 83 Linux

Write down the "Old situation" in case you need to go back to it.
Type 'help' to get more information.

>>> 2048,1046528,83,*
Created a new DOS disklabel with disk identifier 0xa63ad8c1.
Created a new partition 1 of type 'Linux' and of size 511 MiB.
/dev/sda1 :         2048      1048575 (511M) Linux
/dev/sda2: 1048576,15726584,82
Created a new partition 2 of type 'Linux swap / Solaris' and of size 7.5 GiB.
/dev/sda2 :      1048576     16775159 (7.5G) Linux swap / Solaris
/dev/sda3: 16775160,2130708488,e8
Created a new partition 3 of type 'Unknown' and of size 1016 GiB.
/dev/sda3 :     16775160   2147483647 (1016G) unknown
/dev/sda4: 0,0
Ignoring partition.
All partitions used.

New situation:

Device     Boot    Start        End    Sectors  Size Id Type
/dev/sda1  *        2048    1048575    1046528  511M 83 Linux
/dev/sda2        1048576   16775159   15726584  7.5G 82 Linux swap / Solaris
/dev/sda3       16775160 2147483647 2130708488 1016G e8 unknown

Verify that the ending sector of your root partition is the same in the "New situation" as in the "Old situation" and that its size has increased by the "Payload offset" amount. Also verify that its type is now e8.
Do you want to write this to disk? [Y]es/[N]o: y

The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.


If you shrank a swap partition, you must run mkswap to reinitialize its header with the new size.
# mkswap /dev/sda2


Set up a loop device pointing at your file system, which is now at a positive offset into the partition.
# losetup -f --show --offset $((2056*512)) /dev/sda3
/dev/loop0


Verify that the loop device is pointing at your file system.
# blkid /dev/loop0
/dev/loop0: UUID="6f5401f8-12df-4e17-9935-5478f161d51a" TYPE="ext4"

If you do not see a TYPE=, then you've made a mistake somewhere.


Format the LUKS partition. Use the same parameters to luksFormat as you used earlier when you determined the "Payload offset".
Important: If you do not use the same parameters to luksFormat as you used earlier, you may accidentally overwrite the beginning of your file system, which would be Very Bad.
# cryptsetup luksFormat --align-payload 1 /dev/sda3
WARNING!
========
This will overwrite data on /dev/sda3 irrevocably.

Are you sure? (Type uppercase yes): YES
Enter passphrase: [type a strong passphrase here]
Verify passphrase: [repeat the same passphrase here]


Open the LUKS partition.
# cryptsetup luksOpen /dev/sda3 root
Enter passphrase for /dev/sda3: [type your passphrase here]


Encrypt your file system in place.
# dd if=/dev/loop0 of=/dev/mapper/root bs=512

Go have a nap. This will take several hours. I hope you have stable power.


Verify that the mapped device contains your file system.
# blkid /dev/mapper/root
/dev/mapper/root: UUID="6f5401f8-12df-4e17-9935-5478f161d51a" TYPE="ext4"

The UUID and TYPE should be the same as reported by blkid earlier.


Reboot and cross your fingers.
# reboot


Remote Unlocking of Encrypted Root

So now your system is encrypted and prompts you for the passphrase during boot, but what happens if the power flickers while you're away and without physical access to the console? You'd like to be able to SSH in and enter the passphrase to get your system booted up again. Well, you can.


Emerge screen.
# emerge --root=/boot --config-root=/boot app-misc/screen


Rewrite the 50-mountroot init script.
# cat > /boot/init.d/50-mountroot <<EOF
openvt -sw screen busybox sh -c 'until cryptsetup luksOpen /dev/sda3 root ; do : ; done' || :
chvt 1
deallocvt
mount --ro /dev/mapper/root /mnt
EOF


Change the root user's shell to /usr/bin/screen.
# chsh --root /boot --shell /usr/bin/screen root


Now reboot. When you see the passphrase prompt, try SSH'ing in from another computer. You will see the same passphrase prompt. Enter the passphrase on either machine to continue the boot process.