The task is a remote x86_64 binary (both binary and libc were provided), and is marked with "pwn" and "network". So the goal is to exploit some vulnerability to obtain a shell.
They're actually two parts of the
task, named 2manypkts-v1
and 2manypkts-v2
respectively.
The binary has somewhat trivial stack buffer overflow vulnerability. In the first part, you can
just overflow the buffer up to (and beyond) main
return address, and employ well-known ROP technique.
The second part is harder: main
never returns, but buffer can also overwrite some other variables, including
several pointers to heap variables, which would allow to call realloc
with arbitrary arguments.
As we submitted the flag for the first part literally a minute before the game end, we didn't have
time to solve the second part.
Why the task was tagged "network" as well, you might ask? The problem is that the buffer is rather large,
about 57 kilobytes. The data is read into the buffer by means of read
system call, and after that,
it's moved to a different buffer. In other words, we have to fill entire buffer (and some extra ROP chain)
in a single read
call, which is obviously problematic in network conditions. The hint for the task
stated that we should utilize some MTU and fragmentation tricks. Which we did, but it wasn't as simple.
The main loop of the binary allows user to input some data of the following types: "double", "int", "char", "long", "unsigned long".
The user inputs data type first, then integer number of elements (up to 14335 incl.), and then the data itself in binary.
The program is written in C++, and, judging by presence of strikingly similar functions, with extensive use of C++ templates.
Here are the (reverse-engineered) definitions of our classes:
template <typename T>
class TypeContainer {
int fd;
int n_reads;
T *buffer;
T tmp_buffer[14336];
size_t elems_in_tmp_buffer;
size_t elems_in_buffer;
};
class State {
int fd;
int n_reads;
TypeContainer<double> s1;
TypeContainer<char> s2;
TypeContainer<long> s3;
TypeContainer<unsigned long> s4;
TypeContainer<int> s5;
};
A variable of type State
is allocated directly on the stack of the main
function.
The logic of the program is as follows:
After user gives the type and the number of the elements, the program
reads data into tmp_buffer
:
read(this->fd, this->tmp_buffer, nelems * sizeof(T));
The number of successfully read elements is then stored in this->elems_in_tmp_buffer
variable.
After that, elements are moved from temporary buffer to the "persistent" one:
this->buffer_ptr = (T *)realloc(
this->buffer_ptr,
sizeof(T) * (this->nelems_in_tmp_buffer + this->nelems_in_buffer)
);
for (i = 0; i < this->nelems_in_tmp_buffer; ++i) {
this->buffer_ptr[this->nelems_in_buffer + i] = this->tmp_buffer[i];
}
this->nelems_in_buffer += this->nelems_in_tmp_buffer;
The vulnerability lies in the way nelems
is handled.
First of all, the number is checked like this: nelems < 14336
, where nelems
is a signed integer. Any negative number
passes this check. This is, however, insufficient, as negative number becomes a large 64-bit unsigned integer, almost 2**64
.
When such number is passed into read
system call, it immediately errors with EFAULT
. For the curious,
smaller sizes, even 2**32
, do not cause immediate EFAULT
, and will only fail once actual attempt to write to
inaccessible page happens. You can find the exact reason in fs/read_write.c
in the kernel source code. Look for
mentions of EFAULT
.
Secondly, the number is suspectible to integer overflow. Remember that in the read
argument, nelems
is actually multiplied by sizeof(T)
? If nelems
is exactly INT_MIN+1
, and, say, we're dealing with 4-byte
integers, nelems * 4
will overflow and become just 4
. Likewise, INT_MIN+100000
becomes 400000
after multiplication.
This way, we can pass a negative number that passes the first check, and will become almost any number we want
after integer overflow in read
argument calculation.
The next steps are rather obvious. I decided to use two ROP chains. The first would leak contents of some GOT entry,
which would defeat libc ASLR, and them jump back to main
. The second ROP chain will simply call system("/bin/sh")
, which
offsets would be known.
So far, so good.
The exploit reliably works locally.
But note that our payload has to be at least 14336*sizeof(int) = 57344
bytes long,
and also a little bit more than that (it also has to contain comparatively small
ROP chain).
When run over network, read
tends to returns much smaller chunks of data than wanted,
not overflowing our buffer. This was expected, as network stack is not going to wait until
all data has arrived.
As mentioned above, the provided hint advises to mess with MTU and fragmentation.
We came up with this solution: we need to somehow craft large TCP segments, so they will fit our payload entirely. Large TCP segment will be encapsulated in equally large IP datagram. Such large IP datagrams are going to be fragmented (so they will be able to pass through typical 1500-ish MTU Ethernet).
The bottom line is, TCP/IP stack won't be able to pass TCP segment until IP datagram has been reassembleded entirely.
This assures that all our payload arrives atomically. Although it doesn't guarantee large atomic "read
" per se,
in practice, this's exactly what happens (and what we want).
Now the question is, how to make our TCP/IP stack to behave that way? There's no way your typical unmodified Linux will assume that MTU on the entire path is larger than 57000.
After some fruitless attempts of modifiying MTU (which didn't have much effect on segment size), we decided to find an userspace TCP/IP stack that we can easily modify. And we found one.
It's google/netstack, written in Go.
You create a TUN interface, set IP, MTU, owner, and other settings. I will assume that the interface is configured with
10.1.1.1/24
IPv4 address.
Then you can use included program tun_tcp_connect
as a netcat-like proxy.
# Usage: ./tun_tcp_connect <tun-device> <local-ipv4-address> <local-port> <remote-ipv4-address> <remote-port>
./tun_tcp_connect tun0 10.1.1.2 10000 163.172.102.12 30303
In this example, the program will assume the role of endpoint 10.1.1.2:10000, and will attempt to connect to 163.172.102.12:30303. The operating system will receive some packets from 10.1.1.2 arriving on interface with address 10.1.1.1/24 (which is not unlike what happens when you connect to some physical network, and other hosts try to use you as a router). All you need to do at this point is to enable routing from tun0 interface to public internet. As 10.1.1.1/24 is obviously a private network, you'll need a NAT as well.
EXTERNAL=eth0
INTERNAL=tun0
iptables -P FORWARD ACCEPT # default ACCEPT for FORWARD chain
iptables -t nat -A POSTROUTING -o $EXTERNAL -j MASQUERADE
sysctl net.ipv4.conf.${EXTERNAL}.forwarding=1
sysctl net.ipv4.conf.${INTERNAL}.forwarding=1
Initially, I made a mistake, and also included some conntracking in the iptables ruleset. For unknown reasons, it passed IP datagrams only up to 20000 bytes in size. We wasted a lot of time debugging it. Leaving only masquerade rule removed the weird limit.
iptables -P FORWARD DROP
iptables -A FORWARD -i $EXTERNAL -o $INTERNAL -m conntrack --state RELATED,ESTABLISHED -j ACCEPT
iptables -A FORWARD -i $INTERNAL -o $EXTERNAL -j ACCEPT
To make google/netstack actually generate enormous TCP segments, we had to modify it a bit. The patch is attached below, and it's rather trivial.