- Load new kernel + initrd from files:
- Call
reboot
withLINUX_REBOOT_CMD_KEXEC
:
(2) means that kexec
is only available if reboot()
is available as well.
There are two kexec_load-related syscalls:
kexec_load
which takes arbitrary memory (enabled viaCONFIG_KEXEC
)kexec_file_load
which takes file descriptors and might do signature validation (enabled viaCONFIG_KEXEC_FILE
)
KSPP talks only about CONFIG_KEXEC
, not about CONFIG_KEXEC_FILE
. At the same time it recommends sysctl to disable kexec_load
which disables both flavors.
kexec
source code.
kexec_file_load
and reboot
require CAP_SYS_BOOT
capability.
reboot()
inside user namespace doesn't reboot the system, it reboots the namespace (killing it) proof. If LINUX_REBOOT_CMD_KEXEC
is used, it results in EINVAL
. Which in turn means that any container can't actually use kexec
, unless it breaks out of user namespace (if it does, security is compromised anyways).
We can further limit kexec
by dropping CAP_SYS_BOOT
capability for any process forked from machined
(init
). Path towards that is not yet totally clear for me, but some pointers:
- Go issue about runtime, threads and global stuff like settings capabilities
- runc setting capabilities
- os/exec can set Ambient capabilities
- PR_SET_NO_NEW_PRIVS
- article on capabilities
Creating user namespace re-enables all the capabilities back but capabilities inside the user namespace are limited to the resources scoped under the user namespace (more info).
In other words, on protecting kexec
from being used by processes other than machined
:
- For processes directly forked from
machined
(which includeudevd
,containerd
, etc.): we can try to drop capabilities as we fork into those processes. - For containers created by
containerd
(both system and k8s),kexec
shouldn't be available as they reside in user namespace.