Skip to content

Instantly share code, notes, and snippets.

@moyix
Last active March 5, 2021 16:31
Show Gist options
  • Star 13 You must be signed in to star a gist
  • Fork 4 You must be signed in to fork a gist
  • Save moyix/71357ca3b4d65c13e8394517e7093e21 to your computer and use it in GitHub Desktop.
Save moyix/71357ca3b4d65c13e8394517e7093e21 to your computer and use it in GitHub Desktop.
Some notes on the current state of KLEE and cb-multios

Using KLEE on the DARPA CGC challenge binaries (as ported to Linux/OS X by Trail of Bits) is currently not a fun time. Here are a few of the current obstacles.

I'm working off of KLEE master, built against LLVM 3.4, running on Linux (Ubuntu 16.04). Some of this may be easier or harder on other platforms supported by cb-multios (i.e. OS X and maybe someday Windows).

  • KLEE wants a standard int main(int argc, char *argv[]). Most of the challenges instead have int main(void) instead, and some, perversely, use the first int argument to main to hold the address of the flag page. (Edit: this has been fixed in the windows_support branch of cb-multios and should make its way into master soon)
  • The challenge binaries use lots of symbols that conflict with things in libc. For example, many of them define the symbol stdin and then implement a FILE* struct themselves. So when trying to link in klee-uclibc.bc you get symbol clashes. This already has an issue in the cb-multios repo.
  • Getting bitcode for each of the challenges is not easy. Right now the best way I've found to do it is to build the challenge binaries with wllvm, which wraps clang/clang++, produces bitcode, and places a list of the bitcode files that make up a binary into a special debug section in the final ELF. Then for each binary you can run extract-bc, which basically just calls llvm-link on all of the constituent bitcode files.
  • The challenge binaries are compiled for 32-bit x86, but most people are using 64-bit systems. Doing a 32-bit cross-compile of KLEE, stp, LLVM, etc. is possible, but only for a fairly cruel and unusual definition of possible (see Appendix C for a view into the Hellmouth). This also causes problems with KLEE because malloc() may return addresses larger than can fit in a 32-bit pointer; see this KLEE issue for details and a workaround. (Edit: It also looks like the deterministic allocator that was recently added to KLEE can help here; see the options -allocate-determ -allocate-determ-start-address=0x10000000)
  • The challenge binaries contain a "flag page" – 4096 bytes of random data loaded at a fixed address in memory. In cb-multios, this is implemented using mmap, which KLEE doesn't model. See Issue 15 in cb-multios.
  • The use of mmap also occurs in the implementation of CGC's allocate. You can change it to use malloc() and mprotect() though; see Appendix B.

Case Study: Palindrome

Let's take a quick look at using KLEE on Palindrome (selected because it's extremely simple; it doesn't do anything with the CGC random page and doesn't use any conflicting symbols).

Build cb-multios using wllvm. You'll want to apply the patches in Appendices A and B to make the build process go a bit more smoothly. I also edited build.sh so that it wouldn't use Ninja, because I wanted a plain old Makefile (this is because I am afraid of change).

Set some environment variables for wllvm:

LLVMDIR=/home/moyix/git/llvm-3.4/Release/bin
LLVM_LINK_NAME=/home/moyix/git/llvm-3.4/Release/bin/llvm-link
LLVM_CC_NAME=/home/moyix/git/llvm-3.4/Release/bin/clang
LLVM_CXX_NAME=/home/moyix/git/llvm-3.4/Release/bin/clang++
LLVM_COMPILER=clang

And build!

CXX=wllvm++ CC=wllvm ./build.sh

At the end of this, we can extract all of the bitcode files:

cd processed-challenges
find $(find . -name bin) -type f -perm -u=x | while read f; do extract-bc $f || (echo Extracting bitcode failed && break) ; done
cd -
cd build/include
find . -name \*.so | xargs -n 1 extract-bc
cd -

On to Palindrome. We patch it to have a normal main function, by editing processed-challenges/Palindrome/src/service.c and changing main(void) to main(int argc, char **argv).

Now we can go back into the build directory and rebuild with make, and then go back into processed-challenges/Palindrome/bin and run extract-bc Palindrome to get the bitcode file back.

Finally, we can run KLEE. I'm going to just use the options from the KLEE site that are recommended for the coreutils experiment, with some small tweaks (mainly setting stdin to symbolic). We also have to link against the bitcode versions of libcgc.so and libtiny-AES128-C.so; otherwise KLEE won't understand the CGC system call interface. You can let it run for a few minutes and then hit Control-C to stop it:

moyix@lorenzo:~/git/cb-multios/processed-challenges/Palindrome/bin$ klee -link-llvm-lib=/home/moyix/git/cb-multios/build/include/libcgc.so.bc -link-llvm-lib=/home/moyix/git/cb-multios/build/include/tiny-AES128-C/libtiny-AES128-C.so.bc --simplify-sym-indices --write-cvcs --write-cov --output-module --disable-inlining --optimize --use-forked-solver --use-cex-cache --libc=uclibc --allow-external-sym-calls --only-output-states-covering-new --max-sym-array-size=4096 --max-instruction-time=30. --max-time=18000. --watchdog --max-memory-inhibit=false --max-static-fork-pct=1 --max-static-solve-pct=1 --max-static-cpfork-pct=1 --switch-type=internal --search=random-path --search=nurs:covnew --use-batching-search --batch-instructions=10000 --posix-runtime Palindrome.bc --sym-stdin 128 --sym-stdout
KLEE: KLEE: WATCHDOG: watching 24411

KLEE: NOTE: Using klee-uclibc : /home/moyix/git/klee/klee_build_dir/Release+Debug+Asserts/lib/klee-uclibc.bca
KLEE: NOTE: Using model: /home/moyix/git/klee/klee_build_dir/Release+Debug+Asserts/lib/libkleeRuntimePOSIX.bca
KLEE: Linking in library: /home/moyix/git/cb-multios/build/include/libcgc.so.bc.

KLEE: Linking in library: /home/moyix/git/cb-multios/build/include/tiny-AES128-C/libtiny-AES128-C.so.bc.

KLEE: output directory is "/home/moyix/git/cb-multios/processed-challenges/Palindrome/bin/klee-out-0"
KLEE: Using STP solver backend
KLEE: WARNING: undefined reference to function: __isoc99_sscanf
KLEE: WARNING: undefined reference to function: getenv
KLEE: WARNING ONCE: Alignment of memory from call "malloc" is not modelled. Using alignment of 8.
KLEE: WARNING ONCE: calling external: getenv(191253352) at /home/moyix/git/cb-multios/include/libcgc.c:227
KLEE: WARNING ONCE: calling external: syscall(54, 0, 21505, 188883176) at /home/moyix/git/klee/runtime/POSIX/fd.c:1045
KLEE: ERROR: /home/moyix/git/cb-multios/processed-challenges/Palindrome/src/service.c:65: memory error: out of bound pointer
KLEE: NOTE: now ignoring this error at this location
^CKLEE: ctrl-c detected, requesting interpreter to halt.
KLEE: halting execution, dumping remaining states

KLEE: done: total instructions = 59513659
KLEE: done: completed paths = 172816
KLEE: done: generated tests = 1

You can see that it found the stack-based buffer overflow almost immediately.

Appendix A

Here's a small patch to cb-multios's CMakeLists.txt that fixes up some compiler/linker flags to work with clang-3.4:

diff --git a/CMakeLists.txt b/CMakeLists.txt
index 564d7b8..ef159e6 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -12,8 +12,8 @@ set(CMAKE_CXX_STANDARD 11)
 add_compile_options(
     -fno-builtin
     -Wno-int-to-pointer-cast
-    -Wno-writable-strings
-    -g3
+    -Wno-write-strings
+    -ggdb
     -m32
 )

@@ -132,7 +132,7 @@ add_definitions(
 )

 if(LINUX)
-    set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -z execstack -z norelro")
+    set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -lm -Wl,-z,execstack -Wl,-z,norelro")
 endif()

 file(GLOB challenge_binaries processed-challenges/*)

Appendix B

A small patch to make allocate use malloc and mprotect instead of mmap. KLEE still doesn't know what mprotect is, but since it doesn't return a pointer this doesn't stop it from continuing with symbolic execution. The patch also removes the use of mmap in the flag page constructor, which will break any challenges that explicitly rely on having a flag page at a fixed address.

diff --git a/include/libcgc.c b/include/libcgc.c    
index deb456b..f08c6f2 100755
--- a/include/libcgc.c
+++ b/include/libcgc.c
@@ -179,11 +179,12 @@ int allocate(cgc_size_t length, int is_executable, void **addr) {
   if (is_executable)
     page_perms |= PROT_EXEC;
 
-  void *return_address = mmap(NULL, length, page_perms, MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
+  void *return_address = malloc(length);
 
-  if (return_address == MAP_FAILED) {
+  if (return_address == NULL) {
     return errno;
   }
+  mprotect(return_address, length, page_perms);
 
   if (addr)
     *addr = return_address;
@@ -251,6 +252,7 @@ int cgc_random(void *buf, cgc_size_t count, cgc_size_t *rnd_bytes) {
 }
 
 static void __attribute__ ((constructor)) cgc_initialize_flag_page(void) {
+#if 0
   void *mmap_addr = mmap(CGC_FLAG_PAGE_ADDRESS, PAGE_SIZE,
                          PROT_READ | PROT_WRITE,
                          MAP_FIXED | MAP_PRIVATE | MAP_ANONYMOUS,
@@ -259,6 +261,8 @@ static void __attribute__ ((constructor)) cgc_initialize_flag_page(void) {
   if (mmap_addr != CGC_FLAG_PAGE_ADDRESS) {
     err(1, "[!] Failed to map the flag page");
   }
+#endif
+  void *mmap_addr = malloc(PAGE_SIZE);
 
   // Fill the flag page with bytes from the prng
   try_init_prng();

Appendix C

Notes on compiling LLVM 3.4, KLEE, stp, and minisat as a 32-bit cross-compile. Most of this is just fighting with cmake and trying to divine where to add in the -m32 flag.

LLVM (using cmake, since that is the only way I could find to get the 32-bit libraries):

cmake -DCMAKE_BUILD_TYPE=Release -DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_BUILD_32_BITS=ON ..

minisat:

cmake -DCMAKE_C_FLAGS="-m32" -DCMAKE_CXX_FLAGS="-m32" -DCMAKE_EXE_LINKER_FLAGS="-m32" -DCMAKE_SHARED_LINKER_FLAGS="-m32" -DSTATIC_BINARIES=ON -DCMAKE_INSTALL_PREFIX=${HOME}/klee_install ..

klee-uclibc:

Note that we're building with printf enabled, because why not.

./configure --with-llvm-config ~/git/llvm-3.4/llvmbuild/bin/llvm-config --make-llvm-lib
make menuconfig => i386
make KLEE_CFLAGS="-DKLEE_SYM_PRINTF -m32"
make -j $(nproc)

stp:

cmake -DCMAKE_INSTALL_PREFIX=${HOME}/klee_install -DCMAKE_SHARED_LINKER_FLAGS="-m32" -DCMAKE_EXE_LINKER_FLAGS="-m32" -DMINISAT_INCLUDE_DIRS=${HOME}/klee_install/include -DMINISAT_LIBDIR=${HOME}/klee_install/lib -DCMAKE_CXX_FLAGS="-m32" -DCMAKE_C_FLAGS="-m32" -DBUILD_SHARED_LIBS:BOOL=OFF -DENABLE_PYTHON_INTERFACE:BOOL=OFF ..
make -j $(nproc)
make install

klee:

Make sure to first install 32-bit versions of everything KLEE needs (note: there may be more, but these were the ones I was missing):

sudo apt-get install zlib1g-dev:i386 libtinfo-dev:i386 libicu-dev:i386 libcap-dev:i386

Next you'll need to tell KLEE to build its runtime library as 32-bit. I couldn't find a way to do this through cmake, so you get the pleasure of hand-editing a build script. Edit runtime/Makefile.cmake.bitcode.config.in and add -m32 to LLVM.ExtraFlags.

Now we can run this ungodly cmake command. In addition to specifying 32-bit compilation, we also need to tell it not to use RTTI, because it's kind of broken in LLVM 3.4's cmake builds.

cmake -DCMAKE_C_FLAGS="-m32" -DCMAKE_CXX_FLAGS="-m32 -fno-rtti" -DENABLE_UNIT_TESTS=OFF -DENABLE_SOLVER_Z3=OFF -DENABLE_SOLVER_STP=ON -DENABLE_POSIX_RUNTIME=ON -DENABLE_KLEE_UCLIBC=ON -DKLEE_UCLIBC_PATH=/home/moyix/git/klee-uclibc -DLLVM_CONFIG_BINARY=${HOME}/git/llvm-3.4/llvmbuild/bin/llvm-config ..
make -j $(nproc)
@withzombies
Copy link

Some fair criticisms here. I have some responses though:

  • Symbol Collisions - Super frustrating, I agree. As part of the Windows port, we've globally prefixed all symbols with cgc_. This should work on Linux and OS X but we haven't merged it to master yet. https://github.com/trailofbits/cb-multios/tree/windows_support
  • int main(int argc, char **argv) - For some reason DARPA chose to pass the flag page in argc for the CFE binaries. Some use it, others don't. I suspect it was because the flag page was expected to move -- but in practice it does not. We can and should rewrite all these prototypes.
  • The flag page - I really want to remove the flag page. It's stupid and lazy. DARPA used it to implement type-2 PoVs (which leaked back data). This could have and should have been done in a much different way. For all the type-2 PoV challenges, we would need to rewrite their PoVs and some of the functionality of the challenge binaries themselves. Unfortunately, this is unlikely to happen any time soon.
  • 64-bit - These challenges were not written with 64-bit in mind. Porting them to 64-bit is the next major task but I suspect many PoVs will need to be updated or just simply not work afterwards. The vulnerabilities should still be triggerable but they won't manifest in the way the PoVs expect.

Finally, KLEE should be LLVM 3.8+ but it seems its a political issue. For what it's worth, for CGC we used our own KLEE which linked against LLVM 3.6.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment