oxfoo1m3
is a relatively simple crackme with elements of anti-debugging, anti-disassembly, and, as the author put it, anti-libbfd.
I created a new Vagrant virtual machine, and after a bit of fiddling with shared folders, ran the binary:
vagrant@debian9:/vagrant/oxfoo1m3$ ./oxfoo1m3
oxfoo1m3 started ;]
3nt4 p455w0rD:
ABCDABCDABCD
[1]+ Stopped ./oxfoo1m3
vagrant@debian9:/vagrant/oxfoo1m3$ D
-bash: D: command not found
We can see that the binary reads 11 characters, and then, since they look nothing like the password, sends a SIGSTOP to itself. I decided to run it under strace, and to my surprise...
vagrant@debian9:/vagrant/oxfoo1m3$ strace ./oxfoo1m3
execve("./oxfoo1m3", ["./oxfoo1m3"], [/* 18 vars */]) = 0
strace: [ Process PID=4467 runs in 32 bit mode. ]
ptrace(PTRACE_TRACEME) = -1 EPERM (Operation not permitted)
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0xf001} ---
+++ killed by SIGSEGV +++
Segmentation fault
... it segfaults. We can see that the si_addr
spells out "fool" in hexspeak, so this is definitely intentional. We can also see that the first thing it does is a ptrace(PTRACE_TRACEME)
. This is a simple antidebugging technique - when a debugger spawns its debugee, it inserts this call between the fork
and execve
, to allow for debugging regardless of the security settings. Since using PTRACE_TRACEME twice is not allowed, this call, when used within a program, will fail with EPERM
when being debugged.
This also explains the SIGSTOP - when a signal is delivered to a process that is being traced, the process is stopped and the parent is notified.
I decided to objdump
the file, only to find another carefully planted obstacle:
vagrant@debian9:/vagrant/oxfoo1m3$ objdump -x oxfoo1m3
objdump: oxfoo1m3: File format not recognized
Since Linux only reads the ELF header and the program header table, the creator of an ELF file can put whatever garbage they desire in the section header table, and the binary will still execute flawlessly, while thwarting every program that believes in ELF headers.
I opened the binary in Hopper, which, being a tool designed partly with malware analysis in mind, didn't surrender just because a nonessential header was corrupted. However, only two instructions were identified as code:
; ================ B E G I N N I N G O F P R O C E D U R E ================
EntryPoint:
08048080 call EntryPoint+6
08048085 jmp 0x13c701e4
; endp
As you can see, the jump points to a non-sense address, and the call destination is in the middle of the jump. This is a very simple anti-disassembly technique - jumping to the operand bytes of unused instructions. I marked the jump as data, and the subsequent "offset" bytes as code, to make the instructions that are actually executed display properly. This continued for a while, and then I realised that any readable version of this code will have to use assembler macros. I decided to convert the hexdump to a nasm source file and modify it until it makes sense, while making sure it assembles to the same binary, somewhat like what the guys from pret did.
First, I tried reproducing the ELF header with nasm, but after realising that a linker would be necessary, I decided to just spell out the header using the wikipedia page on ELF:
bits 32
org 0x08048000
%define __NR_read 3
%define __NR_ptrace 26
elf_header:
db 0x7f, 'ELF'
db 1 ; 32-bit
db 1 ; little endian
db 1 ; ELF version
db 0 ; System V ABI
db 0 ; ABI version
times 7 db 0 ; padding
dw 2 ; ET_EXEC
dw 3 ; x86
dd 1 ; ELF version, again
dd start
dd program_header - $$
dd section_headers - $$
dd 0 ; architecture-specific flags
dw program_header - $$ ; ELF header size
dw program_header_end - program_header ; program header size
dw 1 ; program header count
dw 0x28 ; section header size
dw 4 ; section header count
dw 3 ; section name header intex
program_header:
dd 1 ; PT_LOAD
dd 0 ; offset in file
dd elf_header ; load address
dd elf_header ; physical address (unused)
dd code_end - $$ ; size on disk
dd code_end - $$ ; size in memory
dd 0b111 ; flags - rwx
dd 0x1000 ; alignment
program_header_end:
times 0x80 - ($ - $$) db 0
start:
I then added all the assembly code as raw bytes using this ad-hoc sed monstrosity:
vagrant@debian9:/vagrant/oxfoo1m3$ xxd -s 0x80 oxfoo1m3 | sed 's/^[0-9a-f]*: /\tdb /g;s/\([0-9a-f]\{2\}\)\([0-9a-f]\{2\}\)/0x\1, 0x\2,/g;s/, .\{16\}//g' >> test.s
Then, I slowly identified the elements of the code by crossreferencing my manual disassembly with Hopper. The most basic building block is what I called ip2edx
, which does exactly what you would expect - copy EIP to EDX:
%macro ip2edx 0
call %%calldest
%%startedx:
db 0xe9 ; jmp, anti-re
%%calldest:
pop edx
add edx, strict dword %%end - %%startedx
push edx
ret
db 0xe9 ; jmp, anti-re
%%end:
%endmacro
This is then used in mcall
, which, apart from code size and clobbering EDX, behaves just like normal call
:
%macro mcall 1
ip2edx
add edx, strict dword %%code - $
push edx
push %1
ret
db 0xe8 ; call, anti-re
%%code:
%endmacro
Finally, a very common function used with mcall
was _nop
:
_nop:
ret
... so I wrapped that in a macro called kdx
(for Kill eDX):
%macro kdx 0
mcall _nop
%endmacro
This made it possible to analyze the first part of the code:
start:
mcall dexor_code
mcall antidebug
jmp strict near dexored_entry
db 0xe8
dexor_code:
mov esi, xor_begin
jmp strict near dexor
_nop:
ret
db 0xe8
dexor:
mov edi, esi
kdx
cld
kdx
mov ecx, xor_end - xor_begin
kdx
mov al, [unkatend] ; dead read
.loop:
lodsb
kdx
xor al, 0x58
kdx
stosb
kdx
loop .loop
ret
As you can see, the kdx
macro is used between every pair of instructions, which makes the expanded code pretty confusing, to say the least. In comparison, a quick glance at this snippet reveals that this is a classic static-key xor decoder. I looked at the hexdump, and, indeed, there was a lot of 0x58 bytes in it. I noticed that there was some data that has looked like xor-0x58, but the length parameter used excluded it from decoding:
00000c10: 9bb1 39c5 95d8 c858 0c30 3d78 163d 2c2f ..9....X.0=x.=,/ ; decoding ends at 0c16
00000c20: 313c 3d78 192b 2b3d 353a 343d 2a78 6876 1<=x.++=5:4=*xhv
00000c30: 6160 766b 6058 5876 2b30 2b2c 2a2c 393a a`vk`XXv+0+,*,9:
00000c40: 5876 2c3d 202c 5876 3b37 3535 3d36 2c58 Xv,= ,Xv;755=6,X
00000c50: 5858 5858 5858 5858 5858 5858 5858 5858 XXXXXXXXXXXXXXXX ; the elf header claims section headers are here
00000c60: 5858 5858 5858 5858 5858 5858 5858 5858 XXXXXXXXXXXXXXXX
00000c70: 5858 5858 5858 5858 5358 5858 5958 5858 XXXXXXXXSXXXYXXX
00000c80: 5e58 5858 d8d8 5c50 d858 5858 cf53 5858 ^XXX..\P.XXX.SXX
00000c90: 5858 5858 5858 5858 4858 5858 5858 5858 XXXXXXXXHXXXXXXX
00000ca0: 4958 5858 5958 5858 0000 0000 0000 0000 IXXXYXXX........ ; the actual end seems to be at 0ca8
00000cb0: 170c 0000 1f00 0000 0000 0000 0000 0000 ................
00000cc0: 0100 0000 0000 0000 0100 0000 0300 0000 ................
00000cd0: 0000 0000 0000 0000 360c 0000 1a00 0000 ........6.......
00000ce0: 0000 0000 0000 0000 0100 0000 0000 0000 ................
Trusting my gut feeling, I decided to ignore what the code was saying and wrote a Python script that decoded a bit more bytes than the assembly decoder:
with open("oxfoo1m3", "rb") as f:
data = list(f.read())
KEY = 0x155
START = 0x196
STOP = 0xca8
data[START:STOP] = [x ^ data[KEY] for x in data[START:STOP]]
data[KEY] = 0
with open("oxfoo1m3-dexored", "wb") as f:
f.write(bytes(data))
The output binary has its key set to zero, making the output, or at least the part that matters, identical. Surprisingly, this fixed the section headers and objdump
started working fine:
oxfoo1m3-dexored: file format elf32-i386
oxfoo1m3-dexored
architecture: i386, flags 0x00000102:
EXEC_P, D_PAGED
start address 0x08048080
Program Header:
LOAD off 0x00000000 vaddr 0x08048000 paddr 0x08048000 align 2**12
filesz 0x00000c17 memsz 0x00000c17 flags rwx
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00000b97 08048080 08048080 00000080 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .comment 0000001f 00000000 00000000 00000c17 2**0
CONTENTS, READONLY
SYMBOL TABLE:
no symbols
There's even a comment left!
00000c17: 0054 6865 204e 6574 7769 6465 2041 7373 .The Netwide Ass
00000c27: 656d 626c 6572 2030 2e39 382e 3338 0000 embler 0.98.38..
To continue, I replaced the encoded blob, and compared the output from the assembler with the decoded binary instead.
The encoded part of the code is constructed a bit differently. Namely, the obfuscation macros preserve all registers and flags:
%macro kn 0 ; for kill nothing
pushfd
pushad
mcall ip2ecx
add ecx, strict dword %%code - $
push ecx
ret
db 0xe9 ; jmp, anti-re
%%code:
popad
popfd
%endmacro
...
ip2ecx:
pop ecx
push ecx
ret
After labeling a bit more of the code, I identified the part responsible for checking the password (noop macros removed):
passworddata:
db 'XXXXXXXXXXX'
db 0x6d, 0x79, 0x6e, 0x65, 0x7b, 0x78, 0x74, 0x76, 0x66, 0x77, 0x7e
.end:
checkpassword:
mov eax, __NR_read
mov ebx, 0
mov ecx, passworddata
mov edx, 11
int 0x80
push eax
pop edx
mov esi, passworddata
mov edi, passworddata ; unused
mov ecx, 11
.loop:
lodsb
xor al, dl
inc dl
push ecx
neg ecx
add ecx, strict dword passworddata.end
cmp al, [ecx] ; [passworddata.end - ecx], ecx goes backwards so this pointer goes forwards
je .skip
mcall fool
.skip:
pop ecx
loop .jmploop
...
.jmploop:
jmp strict near .loop
This was enough to reconstruct the algorithm in python and get the password:
orig = [0x6d, 0x79, 0x6e, 0x65, 0x7b, 0x78, 0x74, 0x76, 0x66, 0x77, 0x7e]
key = 11
out = ''
for b in orig:
out += chr(b ^ key)
key += 1
print(out)