pgy/writeup.md

## writeup.md

      
    Raw
  

              writeup.md
            
          
    KEYGENME writeup

This is a walk-through of how I solved the KEYGENME reverse engineering challenge
at the Google CTF 2018 qualifier.
Challenge description


I bet you can't reverse this algorithm!

The challenge contained an executable binary called main and a server
endpoint that sent us 5 random bytes and requested an answer in return. It
used the binary to validate the answer. The challenge was to reverse engineer
it, and find out what answer would satisfy the server.
What did it do?

I didn't really want to fire up a disassembler/decompiler right away, so I
started to analyze it dynamically. I supposed it probably needed input in some
way, so I piped random data into it and also supplied some random arguments.
I ran the binary with strace to see the system calls it made.
$ cat /dev/urandom | strace ./main a b c d e f |& cut -d\( -f1
execve
mmap
mmap
ptrace
fork
mmap
wait4
ptrace
wait4
...

My first reaction was: "oh boy, fork and ptrace!". It looked like it created a
child process and the parent process debugged the child with ptrace. Or the
child the parent. Or they both ptrace'd each other.
I really hoped it was not the latter case. That would have made debugging even
more miserable than it later turned out to be. The reason is that gdb also uses
ptrace internally, and a process can be traced with only one ptrace-based
debugger at a time.
Note that strace doesn't trace child processes by default, so these were only
the syscalls of the parent.
What did the parent process do?

Let's go through the full output of strace:
execve("./main", ["./main", "a", "b", "c", "d", "e", "f"], 0x7ffc790cd238 /* 42 vars */) = 0

Ok, that is how a normal process starts. Nothing interesting here... wait!
Where is "/etc/ld.preload.so" and "/usr/lib/libc.so.6" and all the usual
loader/libc stuff that happens on program startup? Looks like this program was
compiled statically or didn't use libc.
mmap(NULL, 65536, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, 0, 0) = 0x7f2b4c4a6000
mmap(NULL, 65536, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, 0, 0) = 0x7f2b4c496000

OK, it mapped two RWX pages.
ptrace(PTRACE_DETACH, 0, NULL, SIG_0)   = -1 ESRCH (No such process)

I had no idea why it did that. In CTFs it's usually a good idea to just
move on when not understanding something at first. Maybe it wouldn't be
important.
fork()                                  = 1737

This is forking a child process, its pid was 1737.
mmap(NULL, 65536, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, 0, 0) = 0x7f2b4c486000
wait4(1737, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGTRAP}], 0, NULL) = 1737

Mmap again. The wait4 means that the parent process waited for the child to hit
a debugger trap instruction (int3).
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_TRAPPED, si_pid=1737, si_uid=1000, si_status=SIGTRAP, si_utime=0, si_stime=0} ---

The waiting was not in vain, here a signal notified the parent that the child
had reached a debugger trap instruction and stopped. That was exactly what the
parent was waiting for.
ptrace(PTRACE_CONT, 1737, NULL, SIG_0)  = 0
wait4(1737, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGTRAP}], 0, NULL) = 1737

Here the parent allows the child to continue.
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_TRAPPED, si_pid=1737, si_uid=1000, si_status=SIGTRAP, si_utime=0, si_stime=0} ---
open("/proc/1737/maps", O_RDONLY)       = 3
read(3, "560d1f5d9000-560", 16)         = 16
close(3)                                = 0

Another trap instruction reached. This time the parent read the address of the
child's lowest mapped region. This was probably the child's virtual base address.
ptrace(PTRACE_GETREGS, 1737, NULL, 0x7f2b4c486020) = 0
process_vm_readv(1737, [{iov_base="\314\353\376", iov_len=3}], 1, [{iov_base=0x560d1f5d97c5, iov_len=3}], 1, 0) = 3
process_vm_writev(1737, [{iov_base="^H\211", iov_len=3}], 1, [{iov_base=0x560d1f5d97c5, iov_len=3}], 1, 0) = 3
ptrace(PTRACE_SETREGS, 1737, NULL, 0x7f2b4c486020) = 0

Yay, a bunch of syscalls I never used before! I hit up the man pages of
these syscalls. The ptrace output translated to English:

Parent process reads the registers of the child process
Parent reads 3 bytes ("\314\353\376" == "\xcc\xeb\xfe") from the child's
memory
Parent writes another 3 bytes ("^H\211" == "\x5e\x48\x89") to the same
address
Parent modifies the child's registers.

The address of the read and write (0x560d1f5d97c5) was very close to the
child's base address (0x560d1f5d9000), so it probably modified an executable
memory region in the child. Maybe this was where the child had been stopped
due to an int3 instruction?
What were the data read? Maybe x64 code?
$ printf "\314\353\376" | ndisasm -b64 -
00000000  CC                int3       
00000001  EBFE              jmp short 0x1

Yes, the expected CC (int3) instruction. This meant that the parent process
live patched the child process when that hit an int3. The EBFE instruction is
an infinite loop, probably just a placeholder here. It would get overwritten
later.
What did it patch in?
$ printf "^H\211" | ndisasm -b64 -
00000000  5E                pop rsi
00000001  48                rex.w
00000002  89                db 0x89

Total gibberish. Maybe part of a long, multi-byte instruction? Time would tell.
I continued reading the strace output:
ptrace(PTRACE_CONT, 1737, NULL, SIG_0)  = 0

This allowed the child to continue execution.
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_TRAPPED, si_pid=1737, si_uid=1000, si_status=SIGTRAP, si_utime=0, si_stime=0} ---
wait4(1737, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGTRAP}], 0, NULL) = 1737

Another wait, another signal.
ptrace(PTRACE_GETREGS, 1737, NULL, 0x7f2b4c486020) = 0
process_vm_writev(1737, [{iov_base="\314\353\376", iov_len=3}], 1, [{iov_base=0x560d1f5d97c5, iov_len=3}], 1, 0) = 3
process_vm_readv(1737, [{iov_base="\314\303", iov_len=2}], 1, [{iov_base=0x560d1f5d9845, iov_len=2}], 1, 0) = 2
process_vm_writev(1737, [{iov_base="H\301", iov_len=2}], 1, [{iov_base=0x560d1f5d9845, iov_len=2}], 1, 0) = 2
ptrace(PTRACE_SETREGS, 1737, NULL, 0x7f2b4c486020) = 0
ptrace(PTRACE_CONT, 1737, NULL, SIG_0)  = 0

The first writev here undid the previous patch as it wrote the "CCEBEF"
instructions back to 0x560d1f5d97c5. The second writev introduced another
patch, but to a different address. It looked like the parent process hid its
steps, it had been keeping only one patch visible in the child's memory at
a time. Even if I had created a core dump of the child process, I could
not have recovered all of its instructions. I would have only seen the
placeholder values at certain addresses instead of the normal instructions.
The rest of the strace output was very similar.
I reran strace to dump strings in a sane hexadeceimal notation (-xx) and used
awk to filter for the patches. (I'm quite fond of shell one-liners). I
also grouped the patches by two. This way I could see that every address is
patched exactly twice. First probably valid x64 code is written, then the
second patch restores the original int3 value with a placeholder after it.
$ cat /dev/urandom | strace -i -xx ./main |& tr -s '=[{, ' ' ' | awk '/writev/{print $9, $4}' | paste -sd '\t\n'
0x5593b68dd7c5 "\x5e\x48\x89"   0x5593b68dd7c5 "\xcc\xeb\xfe"
0x5593b68dd845 "\x48\xc1"       0x5593b68dd845 "\xcc\xc3"
0x5593b68dec1b "\x5d\x41"       0x5593b68dec1b "\xcc\xc3"
0x5593b68dec1e "\x41\x5d"       0x5593b68dec1e "\xcc\xc3"
0x5593b68de744 "\x90\x5d\xc3"   0x5593b68de744 "\xcc\xeb\xfe"
0x5593b68ddb67 "\x48\x8b\x85"   0x5593b68ddb67 "\xcc\xeb\xfe"
0x5593b68ddb71 "\x48\x8d"       0x5593b68ddb71 "\xcc\xc3"
0x5593b68de8bd "\x8b\x00"       0x5593b68de8bd "\xcc\xc3"
0x5593b68de9e9 "\x48\x8b"       0x5593b68de9e9 "\xcc\xc3"
0x5593b68ddcc8 "\x89\x45\xd8"   0x5593b68ddcc8 "\xcc\xeb\xfe"
0x5593b68ddd25 "\x8b\x12\x01"   0x5593b68ddd25 "\xcc\xeb\xfe"
0x5593b68ddd56 "\x01\xca"       0x5593b68ddd56 "\xcc\xc3"
0x5593b68ddda8 "\x31\xd1\x48"   0x5593b68ddda8 "\xcc\xeb\xfe"
0x5593b68dde22 "\x8b\x45\xc4"   0x5593b68dde22 "\xcc\xeb\xfe"
0x5593b68dde32 "\x8b\x55"       0x5593b68dde32 "\xcc\xc3"
0x5593b68dde3b "\x48\x83\xc2"   0x5593b68dde3b "\xcc\xeb\xfe"
0x5593b68dde61 "\x8b\x55"       0x5593b68dde61 "\xcc\xc3"
0x5593b68ddf2e "\x01\xd0\x89"   0x5593b68ddf2e "\xcc\xeb\xfe"
0x5593b68ddfeb "\x8b\x45\xe0"   0x5593b68ddfeb "\xcc\xeb\xfe"
0x5593b68de186 "\x8b\x0a"       0x5593b68de186 "\xcc\xc3"
0x5593b68de2b6 "\x01\xd0"       0x5593b68de2b6 "\xcc\xc3"
0x5593b68de2f6 "\x8b\x45\xbc"   0x5593b68de2f6 "\xcc\xeb\xfe"
0x5593b68de3c5 "\x01\xd0"       0x5593b68de3c5 "\xcc\xc3"
0x5593b68de482 "\x8b\x55\xdc"   0x5593b68de482 "\xcc\xeb\xfe"
0x5593b68de540 "\x48\x83\xc2"   0x5593b68de540 "\xcc\xeb\xfe"
0x5593b68de549 "\x01\xca\x01"   0x5593b68de549 "\xcc\xeb\xfe"
0x5593b68de5f1 "\x8b\x4d\xc8"   0x5593b68de5f1 "\xcc\xeb\xfe"
0x5593b68deaed "\x48\x8b"       0x5593b68deaed "\xcc\xc3"
0x5593b68deb0f "\xc1\xe8\x08"   0x5593b68deb0f "\xcc\xeb\xfe"
0x5593b68dd934 "\x8b\x45\xf4"

What did the child do?

I turned my attention to the child process. I used strace with -ff to trace child processes. Strace output can get confusing when the syscalls of two
traced processes overlap, so I instructed strace to save its output into
separate files per process with timestamps prepended. I later merged these
files with awk.
$ cat /dev/urandom | strace -ff -tt -o /tmp/out ./main a b c d e
$ awk '{print $1, FILENAME, $0}' /tmp/out* | sort | cut -d ' ' -f2,4- | cut -d. -f2-

Let's see the output:
19174 execve("./main", ["./main", "a", "b", "c", "d", "e"], 0x7ffefd60ea00) = 0

OK, the pid of the parent was 19174, so the other pid (probably 19175) would be
the child.
19174 mmap
19174 mmap
19174 ptrace(PTRACE_DETACH, 0, NULL, SIG_0) = -1 ESRCH (No such process)
19174 fork() = 19175
19174 mmap

These we saw before, 3 mmaps, a fork and the ptrace I had no idea about.
19175 mmap(NULL, 65536, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, 0, 0) = 0x7f9bd7d6e000

The first syscall of the child process was an mmap.
19174 wait4(19175, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGTRAP && WCOREDUMP(s)}], 0, NULL) = 19175

Parent waits for child. This was the last time the strace output contained the
parent's syscalls. The PTRACE_TRACEME call of the child failed because it was
already ptrace'd by strace, so the parent would never get SIGCHLD with SIGTRAP
status, it would keep on waiting forever.
19175 read(0, "#\3327\334\210", 5) = 5
19175 read(0, "\207\214\0\35\335^\373\313a}\3027T\10"..., 32) = 32

The child read first 5 then 32 bytes from the standard input.
19175 prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0) = 0
19175 seccomp(SECCOMP_SET_MODE_STRICT, 1, NULL) = -1 EINVAL (Invalid argument)
19175 seccomp(SECCOMP_SET_MODE_FILTER, 0, {len=8, filter=0x7f9bd7d81e7e}) = 0

Here we can see the child setting up some kind of a sandbox. I did not
understand the purpose of the first (failed) seccomp call. Had it been successful in setting up a STRICT seccomp context the second call would have
surely caused a seccomp violation.
And the second seccomp call? I was like

Who cares, it's probably not important. ¯\(ツ)/¯

Looking back, this was a huuuuge mistake. Had I examined the seccomp filter,
I would not have needed to resort to using ... well, what I used later to solve
the challenge.
19175 ptrace(PTRACE_TRACEME)  = -1 EPERM (Operation not permitted)

Yes it failed, because strace already ptrace'd it. Fortunately the child
didn't check the return code for errors.
19175 memfd_create("hi", MFD_CLOEXEC) = 3

The child created an anonymous file. The file got file descriptor 3.
19175 write(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0"..., 14488) = 14488

The child wrote 14488 bytes to the anonymous file. It looks like the header of
an ELF executable file, doesn't it? I guessed that the child would later call
execve to execute this binary.
19175 fchmod(3, 0555)         = 0
19175 execve("/proc/self/fd/3", ["#\3327\334\210", "\207\214"], NULL) = 0

Yes, exactly. The values it passed as command line arguments were familiar,
these were the 5 and 32 byte strings it had read from the standard input
before.
I rerun strace with a huge "maximum string size" value (-s 1000000). I could
thus recover the initial (unpatched) binary the child dropped into fd3.
$ rm /tmp/out.*
$ cat /dev/urandom | strace -s 1000000 -xx -ff -tt -o /tmp/out ./main a b c d e
$ cat /tmp/out.* | grep 14488 | cut -d\" -f2 | tr -d '\\x' | xxd -p -r > dropped.elf
$ file dropped.elf
dropped.elf: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, missing section headers

I tried to run the dropped binary. Note that I used exec -a to pass the first
string as the zeroth argument, just like in the execve call before.
$ sh -c 'exec -a AAAAA ./dropped.elf BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB'
Trace/breakpoint trap (core dumped)

Hmm, no surprise, it still lacked the binary patches.
Patching the child binary

To patch the binary it was important to know if the patches had depended on
the input in some way. I ignored the PTRACE_SETREGS calls for a while.
To find that out I straced the parent multiple times with different inputs
and compared the strace logs. I also disabled address space randomization.
$ echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
$ cat /dev/urandom | strace ./main a b c d e > run.1
$ cat /dev/urandom | strace env -i ./main xx yyy zzz > run.2
$ vimdiff run.1 run.2

The traces were very similar, there were only small differences in the pid
numbers and the first couple of bytes of memory locations. (I thought I
disabled ASLR?). Fortunately the last 3 digits of the memory addresses were
the same. That meant that the parent always applied the same patches between
runs.
It was easy to apply the patches to the input file, I just had to take care
not to apply the "undoing" patches. The following python program (using the
pwntools library) does exactly that.
$ cat patch_dropped.py
from pwn import *
context.arch = "amd64"

e = ELF("dropped.elf")
base = 0x555555554000

done = set()
for line in open("run.1"):

    for char in '"()=[]{},':
        line = line.replace(char, " ")
    line = line.split()

    if len(line) > 2 and line[1] == "process_vm_writev":
        data = unhex(line[4].replace("\\x", ""))
        addr = int(line[9], 16) - base
        if addr not in done:
            e.write(addr, data)
            done.add(addr)
    
e.save("dropped_patched.elf")

$ python patch_dropped.py
$ file dropped_patched.elf
dropped_patched.elf: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=1b6bb2a46a5e760d6de3fb5c4782e05c0a118f11, stripped

I tried running it:
$ chmod +x dropped_patched.elf
$ sh -c 'exec -a AAAAA ./dropped_patched.elf BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB'
NO!
Trace/breakpoint trap (core dumped)

OK, that "NO!" output meant that my python program did what I wanted.
Wait, what about the PTRACE_SETREGS? Was there any important stuff there?
For that I had to dig into the parent process.
Reversing the main binary

It was time to examine the main binary in my favorite disassembler. Brace
yourself dear reader, because the binary was pretty obfuscated!
$ objdump -Mintel -d ./main
  4000b0:   48 b8 88 77 66 55 44    movabs rax,0x1122334455667788
  4000b7:   33 22 11 
  4000ba:   48 89 04 25 a6 46 60    mov    QWORD PTR ds:0x6046a6,rax
  4000c1:   00 

Strange value at address 0x6046a6, I wondered what it did.
  4000c2:   e8 c2 00 00 00          call   0x400189

What was that 0x400189 function it called?
    400189:   b8 09 00 00 00          mov    eax,0x9
    40018e:   48 31 ff                xor    rdi,rdi
    400191:   be 00 00 01 00          mov    esi,0x10000
    400196:   ba 07 00 00 00          mov    edx,0x7
    40019b:   41 ba 22 00 00 00       mov    r10d,0x22
    4001a1:   4d 31 c0                xor    r8,r8
    4001a4:   4d 31 c9                xor    r9,r9
    4001a7:   0f 05                   syscall 
    4001a9:   c3                      ret    

Syscall 0x9 is mmap. This was the mmap call we saw in the strace log. OK, so
the function at 0x400189 was mmap(0x10000, RWX). I continued with the main:
  4000c7:   48 89 c5                mov    rbp,rax
  4000ca:   e8 ba 00 00 00          call   0x400189

This is another mmap call, at this point RBP contined the address of the first
RWX page and RAX the second.
  4000cf:   48 89 c7                mov    rdi,rax
  4000d2:   48 8d 34 25 bc 01 60    lea    rsi,ds:0x6001bc
  4000d9:   00 
  4000da:   b9 a8 44 00 00          mov    ecx,0x44a8
  4000df:   f3 a4                   rep movs BYTE PTR es:[rdi],BYTE PTR ds:[rsi]

Ok, so here it copied 0x44a8 bytes from 0x6001bc to the second RWX page.
  4000e1:   48 8b 34 25 a6 46 60    mov    rsi,QWORD PTR ds:0x6046a6
  4000e8:   00 

After this instruction the magic constant 0x1122334455667788 was in RSI.
  4000e9:   48 89 c7                mov    rdi,rax
  4000ec:   b9 a8 44 00 00          mov    ecx,0x44a8
  4000f1:   48 31 37                xor    QWORD PTR [rdi],rsi  <---.
  4000f4:   48 83 c7 08             add    rdi,0x8                  |
  4000f8:   48 83 e9 08             sub    rcx,0x8                  |
  4000fc:   7f f3                   jg     0x4000f1 ----------------'

It looks like the code at 0x6046a6 was encrypted with a simple XOR
encryption, where the key was the 0x1122334455667788 constant. This was
the decryption code.
  4000fe:   50                      push   rax
  4000ff:   c3                      ret

This is basically a jump to rax, that is to the decrypted code.
I patched the binary to remove the XOR encryption.
$ cat decrypt_main.py
from pwn import *
context.arch = "amd64"

main = ELF("./main")

# xor decrypt
enc = main.read(0x06001BC, 0x44a8)
key = p64(0x1122334455667788) * (len(enc)//8)
dec = xor(enc, key)
main.write(0x6001BC, dec)

# skip decrypting routine
main.write(0x4000E9, asm("push rax; ret"))

main.save("./main_decrypted.elf")

I won't go through everything in the main binary, but here I list some
interesting obfuscation/shellcode techniques I found in it.
Self-modifying code

006001DE: 5B                 pop     rbx
0x6001DF: 66 83 C3 0A        add     bx, 0Ah

Rbx contains the address of 0x6001E7 here.
0x6001E3: 80 33 82           xor     byte ptr [rbx], 82h 

Xor the 2nd byte of the next instruction. That instruction would move 0xe7
into eax. This is the syscall number of exit_group.
0x6001E6: B8 E7 00 00 00     mov     eax, 0E7h       

After the xor is executed this instruction is really mox eax, 0x65, where
0x65 is the syscall number of ptrace. 0x11 is PTRACE_DETACH.
0x6001EB: BF 11 00 00 00     mov     edi, 11h
0x6001F0: 0F 05              syscall
0x6001F2: C3                 retn

So this is where our mysterious PTRACE_DETACH came from! By only skimming
through the instructions one could think the binary exits here, but in reality
it changes the exit syscall into -- the effectively no-op -- PTRACE_DETACH.
RIP-relative data loading

In normal binaries executable code and read-only data are usually in different
section of the binary. This program mixes them in the XOR-ed page. The problem
is that the address of the decrypted page changes every time the binary is run
so it uses a nice trick to get data addresses in a position independent way.
See for example the "function" that loads the address of the seccomp filter
into memory:
            get_rip:  
0x604033:       pop     rax
0x604034:       retn

            get_seccomp_filter_data_address:
0x604035:       call    0x604033    ; get_rip

            start_of_seccomp_filter_data:
0x60403A:       sock_filter <20h, 0, 0, 4>
0x60403A:       sock_filter <15h, 0, 5, 0C000003Eh>
0x60403A:       sock_filter <20h, 0, 0, 0>
0x60403A:       sock_filter <35h, 3, 0, 40000000h>
0x60403A:       sock_filter <15h, 1, 0, 5Fh>
0x60403A:       sock_filter <6, 0, 0, 7FFF0000h>
0x60403A:       sock_filter <6, 0, 0, 50001h>
0x60403A:       sock_filter <6, 0, 0, 0>

The entry point is get_seccomp_filter_data_address. When a call to
get_seccomp_filter_data_address is executed, a return address is pushed onto
the stack. When the second call is executed, the start_of_seccomp_filter_data
address is then pushed onto the stack, because this is where the second call
should return to. Execution then jumps to get_rip, that pops the
start_of_seccomp_filter_data address from the stack into RAX. It then
executes a RET, which pops off the original caller's address and jumps to it.
And we are done, start_of_seccomp_filter_data is in RAX, and the caller
can continue execution normally.
I found many places in the binary where this trick was used.
Ropchain-like control flow

A nice example example for this I found is at address 0x6042D9.
            load_data_2_gadget:
0x6042D9:                 mov     r8, rax
0x6042DC:                 lea     r9, load_data_3_gadget
0x6042E3:                 push    r9              
0x6042E5:                 jmp     get_data_2_address
            load_data_3_gadget:
0x6042EA:                 mov     r9, rax
0x6042ED:                 lea     r10, load_data_4_gadget
0x6042F4:                 push    r10             
0x6042F6:                 jmp     get_data_3_address

What happens here? The LEA and PUSH instruction pushes the address of the next
gadget onto the stack. The jmp instructions call the get_data functions without
pushing their return value. When the get_data functions return they will pop
off the address of the next gadget jump to it.
The PTRACE_SETREGS calls

Now that I knew how the main binary worked I could examine what the
PTRACE_SETREGS calls did. I wrote a short gdb script for it. (This script
doesn't work with main_patched.elf, only with the original binary.)
# stop right after XOR-ing the real code
break *0x04000FF

# run until the breakpoint
run <<< AAAAABBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

# this is where the real code of the program starts
set $base = *(unsigned long*)($rsp)

# break right after the getregs syscall and print the old reg values
break *($base + 15599 + 2)
commands
   printf "XXX GETREGS registers\n"
   x/27gx $r10
   printf "XXX"
   continue
end

# break right before the setregs syscall and print the new reg values
break *($base + 15717)
commands
   printf "XXX SETREGS registers\n"
   x/27gx $r10
   printf "XXX"
   continue
end

# continue execution
continue

I ran this with
$ gdb --batch -x main.gdb ./main &> REGISTERS.txt

I could see from the output that the SETREGS commands only decremented the
child's RIP by one. Why? Because the parent applies a patch to the child when
the child steps on an int3 instruction. When that instruction is exeucted the
RIP will point to the next instruction. By decrementing the RIP after patching,
the parent forces the child to continue execution from the start of the patch,
the address of the instruction that was originally int3.
That didn't help much. I turned my attention towards the dropped binary.
The dropped binary

$ file dropped_patched.elf
dropped_patched.elf: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=1b6bb2a46a5e760d6de3fb5c4782e05c0a118f11, stripped

$ strace sh -c 'exec -a AAAAA ./dropped_patched.elf BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB'
[..snip..]
umask(000)                              = 022
umask(000)                              = 000
fstat(1, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
[..snip..]

This was the (decompiled) main function:
void __usercall __noreturn main(char **args@<rsi>, char **a2@<rdx>, char **a3@<rax>)
{
  *((_BYTE *)exit + 2) = 0xBAu;
  *a3 = *a2;
  maybe_atexit((__int64)int3);
  exit(0);
}

Obfuscation again. The exit function contained only a jump to the exit PLT entry. By modifying a byte in the exit function the dropped binary redirected
the exit function to another (hidden) PLT entry. That PLT entry contained the
umask syscall. So I manually patched exit+2 to be 0xBA, and turned the
patching instruction into NOPs. After that IDA could decompile the function:
void main(int argc, char *argv[])
{
    size_t five; // rax
    char hex_hash2_32[33]; // [rsp+20h] [rbp-110h]
    char raw_hash1_16[16]; // [rsp+50h] [rbp-E0h]
    char hex_hash1_32[33]; // [rsp+60h] [rbp-D0h]
    char raw_hash1_160[160]; // [rsp+90h] [rbp-A0h]

    maybe_atexit(int3);
    exit_that_is_umask_really(0LL);
    
    memset(hex_hash1_32, 0, 33);

    five = strlen(argv[0]);
    
    hash1_init(raw_hash1_160);
    hash1_calc(raw_hash1_160, argv[0], five);
    hash1_compress(raw_hash1_16, raw_hash1_160);
    enhex_string(raw_hash1_16, hex_hash1_32);

    memset(hex_hash2_32, 0, 33);
    hash2(argv[1], hex_hash2_32);

    if (!strcmp(hex_hash2_32, hex_hash1_32))
        puts("OK!");
    else
        puts("NO!");
}

To sum it up, it takes 5 bytes from argv[0] and 32 bytes from argv[1], hashes
them with two different hash functions and compares the result.
Hash1

Hash1 is at 0xC46. There was some calculation there that looked like a hash function, so I named it hash1. I did not recognize the hash algorithm.
The hash function -- just like the main function -- starts with a call to
exit_that_is_umask_really. After this it checks if the PWD environment
variable exists and modifies a variable if it does. So the hash calculation
depended on the PWD environment variable! I looked back to the strace of the
child process to determine how it was run:
19175 execve("/proc/self/fd/3", ["#\3327\334\210", "\207\214"], NULL) = 0

With NULL environment, so no PWD variable here. I replicated this in my
runner script (note the -c argument of exec):
$ exec -c -a AAAAA ./dropped_patched.elf BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

Hash2

void hash2(char *hex_input, char *hex_output)
{
  char v2; // al
  char hashed[16]; // [rsp+10h] [rbp-30h]
  char unhexed[24]; // [rsp+20h] [rbp-20h]
  int j; // [rsp+38h] [rbp-8h]
  int i; // [rsp+3Ch] [rbp-4h]

  for ( i = 0; i <= 15; ++i )
  {
    v2 = unhex_byte(&hex_input[2 * i]);
    unhexed[i] = v2;
  }
  for ( j = 0; j <= 15; ++j )
    hashed[j] = j ^ (unhexed[j] & 0xF | unhexed[15 - j] & 0xF0) ^ 16 * j;
  enhex_string(hashed, hex_output);
}

Well this didn't look like a military grade, secure hash function. The following python2 code calculates the inverse of hash2:
def inverse_hash2(enc):
    unhexed = enc.decode('hex')
    hashed = ""
    for j in xrange(16):
        k = 15 - j
        b1 = ((ord(unhexed[j]) ^ j ^ (16 * j)) & 0xF0)
        b2 = ((ord(unhexed[k]) ^ k ^ (16 * k)) & 0x0F)
        hashed = chr(b1 | b2) + hashed
    return hashed.encode('hex')

Cracking the keygen in theory

At this point it was obvious how to crack the challenge.


Let's say the server sent us the string "MaLAC".


We run dropped_patched.elf in gdb with NULL environment and argv[0] set to
"MaLAC" and argv[1] set to some random 32 byte string.


We stop the execution right after hex_hash1_32 is calculated.


We observe that hex_hash1_32 = "784542fbc8f7f8c99701967786997220".


We can calculate inverse_hash2("784542fbc8f7f8c99701967786997220") and send
the result to the server as the correct solution.


Step 3 is tricky, because gdb always sets PWD when it runs a binary. I used
the very useful set wrapper-script gdb command to circumvent that. This was
my wrapper script:
#!/usr/bin/env bash
exec -c -a $TOKEN ./dropped_patched.elf BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

Failure and the best trick I haven't seen

I did all 5 steps, but the server's answer was:

BAD!

I tried it multiple times, but BAD all the time. Even when I tried with the
original, unmodified main binary locally, it said

NO!

Clearly I had missed something during reversing. ᕦ(ò_óˇ)ᕤ
Now that I'm writing this writeup it is fairly obvious what I missed: the
seccomp filter, of course. I never examined it.
Turns out, it hijacked the umask system call to return a different value.
The purpose of the umask call in hash1 was twofold: on one hand it was disguised as exit to fool the decompiler, but on the other hand it also modified RAX to the return value of the seccomp-hijacked syscall. RAX was later used in the hash calculation.
The problem was that I had been running the extracted binary without the
tricky seccomp filter, because the seccomp syscall happened in the child
process, before it execve'd the dropped_binary.
Nevertheless I managed to solve the challenge, but it took considerably longer
than it would have taken had I seen this nice trick.
Cracking the keygen in practice

It became clear that I could not reproduce the original environment of
dropped_patched.elf, so I turned to dumping the value of hex_hash1_32
from the original binary.
The problem was that it was ptrace'd by the parent, so I couldn't just
connect with gdb to it.
I tried ltrace to trace the dynamic function calls.
$ ltrace ./main <<< MaLACBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
ltrace ./main <<< MaLACBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
Couldn't find .dynsym or .dynstr in "/proc/25987/exe"
NO!

Unfortunately main was compiled statically, without libc, so that didn't work. (Yes, I could have attached to the child process, but that didn't occur
to me back then.)
I tried to create core dumps of the child process by sending SIGQUIT with
kill -3 to it, but the resulting dumps did not contain the string I was
looking for. Maybe my timing was off. Also for some reason the linux system I
used didn't create core files directly, it sent them to some stupid systemd
module (coredumpctl) that I wasn't familiar with, so I gave up this approach
after a couple tries.
I tried to overwrite strcmp with LD_PRELOAD to print its arguments, but it
failed too, because (remember?) the binary is execve'd with NULL environment
pointer.
What I finally did was, that I looked for the execve call in main, and patched
it so that it passed the environment pointer. For that I had to find the
environment pointer. There is usually one on the stack, so I messed around in
gdb at the execve call and I found that there was indeed one at RSP+0x48.
I patched main with this script:
from pwn import *
import io, os
context.arch = "amd64"

main = ELF("./main")

# un-xor the instructions
enc = main.read(0x06001BC, 0x44a8)
key = p64(0x1122334455667788) * (len(enc)//8)
dec = io.BytesIO(xor(enc, key))

base = 0x6001BC
addr = 0x603DC7  - base     # execve call
new = asm("mov rdx, rsp ; add rdx, 0x48 ; syscall ; nop ; nop ; nop ; nop")
dec.seek(addr)
old = dec.read(len(new))
assert old == unhex("4831d20f054c89c8e879080000")
dec.seek(addr)
dec.write(new)

# re-xor the instructions
enc = xor(key, dec.getvalue())

main.write(0x06001BC, enc)
main.save("./main_patched")
os.system("chmod +x ./main_patched")

I compiled a shared library to LD_PRELOAD:
$ cat preload.c
#define _GNU_SOURCE
#include <dlfcn.h>
#include <stdio.h>
#include <string.h>
static int (*real_strcmp)(const char*, const char*) = NULL;
static void mtrace_init(void)
{
    real_strcmp = dlsym(RTLD_NEXT, "strcmp");
    real_strlen = dlsym(RTLD_NEXT, "strlen");
    if (NULL == real_strcmp) {
        fprintf(stderr, "Error in `dlsym`: %s\n", dlerror());
    }
}
int strcmp(const char *s1, const char *s2)
{
    if (real_strcmp == NULL) {
        mtrace_init();
    }
    printf("YYY: strcmp(\"%s\", \"%s\")\n", s1, s2);
    printf("XXX: %s\n", s2);
    fflush(stdout);
    return real_strcmp(s1, s2);
}
$ gcc -Wall -Wextra -shared -fPIC -o preload.so preload.c -ldl

And created a small bash script to run main_patched with only LD_PRELOAD in its
environment (with no PWD).
$ cat get_this_shit.sh
#!/usr/bin/env bash
printf ${token}BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB | 
    env -i LD_PRELOAD=$(pwd)/preload.so ./main_patched |
    grep XXX: | cut -d: -f2 | tr -cd '[0-9a-fA-F]'

The full exploit script:
from pwn import *
import sys, os

def hash2(dec):
    dec = bytearray(unhex(dec))
    enc = [0] * 16
    for j in range(16):
        enc[j] = j ^ ((dec[j] & 0xF) | (dec[15 - j] & 0xf0)) ^ 16 * j
    return enhex("".join(chr(i) for i in enc))

def inverse_hash2(enc):
    assert len(enc) > 30
    unhexed = enc.decode('hex')
    hashed = ''
    for j in xrange(16):
        k = 15 - j
        b1 = ((ord(unhexed[j]) ^ j ^ (16 * j)) & 0xF0)
        b2 = ((ord(unhexed[k]) ^ k ^ (16 * k)) & 0x0F)
        hashed = chr(b1 | b2) + hashed
    return hashed.encode('hex')

def solve(token5):
    print "TOKEN:", token5
    p = process(["./get_this_shit.sh", token5])
    p.shutdown()
    hex_hash1_32 = p.recvall()
    print "INNER32:", hex_hash1_32
    solution = inverse_hash2(hex_hash1_32).strip()
    assert hash2(solution).upper() == hex_hash1_32.upper()
    print "SOLUTION:", solution, "\n\n\n"
    return solution

r = remote("keygenme.ctfcompetition.com", 1337)

while True:
    token5 = r.readline().strip()
    r.sendline(solve(token5))
    rsp = r.recvline()
    print "RESPONSE", rsp
    assert "OK" in rsp

After answering the server's requests like a dozen time, it got the flag:

CTF{g1mm3_A11_T3h_keyZ}

Conclusion

I liked this challenge because while it was not impossibly hard, its author
employed many different techniques to hinder the reverse engineering effort.
It had self-modifying code, ropchain-like control flow, hidden payload (I
still have no idea how the dropped binary is encoded in the main binary),
ptraced child process, and live binary patching.
It forced me to be creative with debugging, I used gdb, strace, ltrace,
LD_PRELOAD, I binary patched a lot and learned about SIGQUIT and
set wrapper-script.
Also, KEYGENME had the exit trick that fooled IDA and the seccomp trick
that fooled me ☺.
Had the child process also ptrace'd and live-patched its parent, it would have
been the perfect CTF challenge I would have probably given up on halfway
through.