Using KLEE on the DARPA CGC challenge binaries (as ported to Linux/OS X by Trail of Bits) is currently not a fun time. Here are a few of the current obstacles.
I'm working off of KLEE master, built against LLVM 3.4, running on Linux (Ubuntu 16.04). Some of this may be easier or harder on other platforms supported by cb-multios (i.e. OS X and maybe someday Windows).
- KLEE wants a standard
int main(int argc, char *argv[])
. Most of the challenges instead haveint main(void)
instead, and some, perversely, use the first int argument to main to hold the address of the flag page. (Edit: this has been fixed in thewindows_support
branch ofcb-multios
and should make its way into master soon) - The challenge binaries use lots of symbols that conflict with things in libc. For example, many of them define the symbol
stdin
and then implement a FILE* struct themselves. So when trying to link inklee-uclibc.bc
you get symbol clashes. This already has an issue in the cb-multios repo. - Getting bitcode for each of the challenges is not easy. Right now the best way I've found to do it is to build the challenge binaries with wllvm, which wraps clang/clang++, produces bitcode, and places a list of the bitcode files that make up a binary into a special debug section in the final ELF. Then for each binary you can run extract-bc, which basically just calls llvm-link on all of the constituent bitcode files.
- The challenge binaries are compiled for 32-bit x86, but most people are using 64-bit systems. Doing a 32-bit cross-compile of KLEE, stp, LLVM, etc. is possible, but only for a fairly cruel and unusual definition of possible (see Appendix C for a view into the Hellmouth). This also causes problems with KLEE because malloc() may return addresses larger than can fit in a 32-bit pointer; see this KLEE issue for details and a workaround. (Edit: It also looks like the deterministic allocator that was recently added to KLEE can help here; see the options
-allocate-determ -allocate-determ-start-address=0x10000000
) - The challenge binaries contain a "flag page" – 4096 bytes of random data loaded at a fixed address in memory. In cb-multios, this is implemented using mmap, which KLEE doesn't model. See Issue 15 in cb-multios.
- The use of
mmap
also occurs in the implementation of CGC'sallocate
. You can change it to usemalloc()
andmprotect()
though; see Appendix B.
Let's take a quick look at using KLEE on Palindrome
(selected because it's extremely simple; it doesn't do anything with the CGC random page and doesn't use any conflicting symbols).
Build cb-multios using wllvm
. You'll want to apply the patches in Appendices A and B to make the build process go a bit more smoothly. I also edited build.sh
so that it wouldn't use Ninja, because I wanted a plain old Makefile (this is because I am afraid of change).
Set some environment variables for wllvm
:
LLVMDIR=/home/moyix/git/llvm-3.4/Release/bin
LLVM_LINK_NAME=/home/moyix/git/llvm-3.4/Release/bin/llvm-link
LLVM_CC_NAME=/home/moyix/git/llvm-3.4/Release/bin/clang
LLVM_CXX_NAME=/home/moyix/git/llvm-3.4/Release/bin/clang++
LLVM_COMPILER=clang
And build!
CXX=wllvm++ CC=wllvm ./build.sh
At the end of this, we can extract all of the bitcode files:
cd processed-challenges
find $(find . -name bin) -type f -perm -u=x | while read f; do extract-bc $f || (echo Extracting bitcode failed && break) ; done
cd -
cd build/include
find . -name \*.so | xargs -n 1 extract-bc
cd -
On to Palindrome. We patch it to have a normal main
function, by editing processed-challenges/Palindrome/src/service.c
and changing main(void)
to main(int argc, char **argv)
.
Now we can go back into the build
directory and rebuild with make
, and then go back into processed-challenges/Palindrome/bin
and run extract-bc Palindrome
to get the bitcode file back.
Finally, we can run KLEE. I'm going to just use the options from the KLEE site that are recommended for the coreutils experiment, with some small tweaks (mainly setting stdin to symbolic). We also have to link against the bitcode versions of libcgc.so
and libtiny-AES128-C.so
; otherwise KLEE won't understand the CGC system call interface. You can let it run for a few minutes and then hit Control-C to stop it:
moyix@lorenzo:~/git/cb-multios/processed-challenges/Palindrome/bin$ klee -link-llvm-lib=/home/moyix/git/cb-multios/build/include/libcgc.so.bc -link-llvm-lib=/home/moyix/git/cb-multios/build/include/tiny-AES128-C/libtiny-AES128-C.so.bc --simplify-sym-indices --write-cvcs --write-cov --output-module --disable-inlining --optimize --use-forked-solver --use-cex-cache --libc=uclibc --allow-external-sym-calls --only-output-states-covering-new --max-sym-array-size=4096 --max-instruction-time=30. --max-time=18000. --watchdog --max-memory-inhibit=false --max-static-fork-pct=1 --max-static-solve-pct=1 --max-static-cpfork-pct=1 --switch-type=internal --search=random-path --search=nurs:covnew --use-batching-search --batch-instructions=10000 --posix-runtime Palindrome.bc --sym-stdin 128 --sym-stdout
KLEE: KLEE: WATCHDOG: watching 24411
KLEE: NOTE: Using klee-uclibc : /home/moyix/git/klee/klee_build_dir/Release+Debug+Asserts/lib/klee-uclibc.bca
KLEE: NOTE: Using model: /home/moyix/git/klee/klee_build_dir/Release+Debug+Asserts/lib/libkleeRuntimePOSIX.bca
KLEE: Linking in library: /home/moyix/git/cb-multios/build/include/libcgc.so.bc.
KLEE: Linking in library: /home/moyix/git/cb-multios/build/include/tiny-AES128-C/libtiny-AES128-C.so.bc.
KLEE: output directory is "/home/moyix/git/cb-multios/processed-challenges/Palindrome/bin/klee-out-0"
KLEE: Using STP solver backend
KLEE: WARNING: undefined reference to function: __isoc99_sscanf
KLEE: WARNING: undefined reference to function: getenv
KLEE: WARNING ONCE: Alignment of memory from call "malloc" is not modelled. Using alignment of 8.
KLEE: WARNING ONCE: calling external: getenv(191253352) at /home/moyix/git/cb-multios/include/libcgc.c:227
KLEE: WARNING ONCE: calling external: syscall(54, 0, 21505, 188883176) at /home/moyix/git/klee/runtime/POSIX/fd.c:1045
KLEE: ERROR: /home/moyix/git/cb-multios/processed-challenges/Palindrome/src/service.c:65: memory error: out of bound pointer
KLEE: NOTE: now ignoring this error at this location
^CKLEE: ctrl-c detected, requesting interpreter to halt.
KLEE: halting execution, dumping remaining states
KLEE: done: total instructions = 59513659
KLEE: done: completed paths = 172816
KLEE: done: generated tests = 1
You can see that it found the stack-based buffer overflow almost immediately.
Here's a small patch to cb-multios's CMakeLists.txt that fixes up some compiler/linker flags to work with clang-3.4:
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 564d7b8..ef159e6 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -12,8 +12,8 @@ set(CMAKE_CXX_STANDARD 11)
add_compile_options(
-fno-builtin
-Wno-int-to-pointer-cast
- -Wno-writable-strings
- -g3
+ -Wno-write-strings
+ -ggdb
-m32
)
@@ -132,7 +132,7 @@ add_definitions(
)
if(LINUX)
- set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -z execstack -z norelro")
+ set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -lm -Wl,-z,execstack -Wl,-z,norelro")
endif()
file(GLOB challenge_binaries processed-challenges/*)
A small patch to make allocate
use malloc
and mprotect
instead of mmap
. KLEE still doesn't know what mprotect
is, but since it doesn't return a pointer this doesn't stop it from continuing with symbolic execution. The patch also removes the use of mmap in the flag page constructor, which will break any challenges that explicitly rely on having a flag page at a fixed address.
diff --git a/include/libcgc.c b/include/libcgc.c
index deb456b..f08c6f2 100755
--- a/include/libcgc.c
+++ b/include/libcgc.c
@@ -179,11 +179,12 @@ int allocate(cgc_size_t length, int is_executable, void **addr) {
if (is_executable)
page_perms |= PROT_EXEC;
- void *return_address = mmap(NULL, length, page_perms, MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
+ void *return_address = malloc(length);
- if (return_address == MAP_FAILED) {
+ if (return_address == NULL) {
return errno;
}
+ mprotect(return_address, length, page_perms);
if (addr)
*addr = return_address;
@@ -251,6 +252,7 @@ int cgc_random(void *buf, cgc_size_t count, cgc_size_t *rnd_bytes) {
}
static void __attribute__ ((constructor)) cgc_initialize_flag_page(void) {
+#if 0
void *mmap_addr = mmap(CGC_FLAG_PAGE_ADDRESS, PAGE_SIZE,
PROT_READ | PROT_WRITE,
MAP_FIXED | MAP_PRIVATE | MAP_ANONYMOUS,
@@ -259,6 +261,8 @@ static void __attribute__ ((constructor)) cgc_initialize_flag_page(void) {
if (mmap_addr != CGC_FLAG_PAGE_ADDRESS) {
err(1, "[!] Failed to map the flag page");
}
+#endif
+ void *mmap_addr = malloc(PAGE_SIZE);
// Fill the flag page with bytes from the prng
try_init_prng();
Notes on compiling LLVM 3.4, KLEE, stp, and minisat as a 32-bit cross-compile. Most of this is just fighting with cmake and trying to divine where to add in the -m32
flag.
LLVM (using cmake, since that is the only way I could find to get the 32-bit libraries):
cmake -DCMAKE_BUILD_TYPE=Release -DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_BUILD_32_BITS=ON ..
minisat:
cmake -DCMAKE_C_FLAGS="-m32" -DCMAKE_CXX_FLAGS="-m32" -DCMAKE_EXE_LINKER_FLAGS="-m32" -DCMAKE_SHARED_LINKER_FLAGS="-m32" -DSTATIC_BINARIES=ON -DCMAKE_INSTALL_PREFIX=${HOME}/klee_install ..
klee-uclibc:
Note that we're building with printf enabled, because why not.
./configure --with-llvm-config ~/git/llvm-3.4/llvmbuild/bin/llvm-config --make-llvm-lib
make menuconfig => i386
make KLEE_CFLAGS="-DKLEE_SYM_PRINTF -m32"
make -j $(nproc)
stp:
cmake -DCMAKE_INSTALL_PREFIX=${HOME}/klee_install -DCMAKE_SHARED_LINKER_FLAGS="-m32" -DCMAKE_EXE_LINKER_FLAGS="-m32" -DMINISAT_INCLUDE_DIRS=${HOME}/klee_install/include -DMINISAT_LIBDIR=${HOME}/klee_install/lib -DCMAKE_CXX_FLAGS="-m32" -DCMAKE_C_FLAGS="-m32" -DBUILD_SHARED_LIBS:BOOL=OFF -DENABLE_PYTHON_INTERFACE:BOOL=OFF ..
make -j $(nproc)
make install
klee:
Make sure to first install 32-bit versions of everything KLEE needs (note: there may be more, but these were the ones I was missing):
sudo apt-get install zlib1g-dev:i386 libtinfo-dev:i386 libicu-dev:i386 libcap-dev:i386
Next you'll need to tell KLEE to build its runtime library as 32-bit. I couldn't find a way to do this through cmake
, so you get the pleasure of hand-editing a build script. Edit runtime/Makefile.cmake.bitcode.config.in
and add -m32
to LLVM.ExtraFlags
.
Now we can run this ungodly cmake command. In addition to specifying 32-bit compilation, we also need to tell it not to use RTTI, because it's kind of broken in LLVM 3.4's cmake builds.
cmake -DCMAKE_C_FLAGS="-m32" -DCMAKE_CXX_FLAGS="-m32 -fno-rtti" -DENABLE_UNIT_TESTS=OFF -DENABLE_SOLVER_Z3=OFF -DENABLE_SOLVER_STP=ON -DENABLE_POSIX_RUNTIME=ON -DENABLE_KLEE_UCLIBC=ON -DKLEE_UCLIBC_PATH=/home/moyix/git/klee-uclibc -DLLVM_CONFIG_BINARY=${HOME}/git/llvm-3.4/llvmbuild/bin/llvm-config ..
make -j $(nproc)
Some fair criticisms here. I have some responses though:
cgc_
. This should work on Linux and OS X but we haven't merged it to master yet. https://github.com/trailofbits/cb-multios/tree/windows_supportFinally, KLEE should be LLVM 3.8+ but it seems its a political issue. For what it's worth, for CGC we used our own KLEE which linked against LLVM 3.6.1.