moyix/symexec_tut.md

## symexec_tut.md

      
    Raw
  

              symexec_tut.md
            
          
    To get the docker images:
docker pull klee/klee
docker pull angr/angr
KLEE

Start docker:
docker run -ti --ulimit='stack=-1:-1' klee/klee
To get the maze:
wget http://panda.moyix.net/~moyix/a_maze.c
Then modify it to add KLEE statements:


Add #include <klee/klee.h> to the top


Replace
 read(1, program, ITERS) 

with
 klee_make_symbolic(program, ITERS, "program")


Add klee_assert(0) inside the conditional containing
 printf ("You win!\n");


Now we can compile it into LLVM bitcode, which is what KLEE symbolically
executes:
clang -I klee_src/include -emit-llvm -c a_maze.c -o a_maze.bc
And then run it on the maze:
klee ./a_maze.bc
KLEE puts its output into directories named klee-out-<num> and makes a
symbolic link to the most recent one named klee-last. In that directory
you will find lots of .ktest files, one per test case KLEE generated.
If KLEE found an error (e.g., a failed assert) then it will also create
a .err file containing the error that was produced. In this case, that
will indicate which test case caused it to reach our assert statement.
We can then inspect the test with ktest-tool:
$ ktest-tool test000134.ktest 
ktest file : 'test000134.ktest'
args       : ['./a_maze.bc']
num objects: 1
object    0: name: b'program'
object    0: size: 28
object    0: data: b'sddwddddsddw\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff'

We can see that 'sddwddddsddw' is a valid path through the maze (though
it uses false walls).
KLEE will only report one test case per error by default, so if you want
all paths through the maze, you have to use -emit-all-errors:
klee -emit-all-errors ./a_maze.bc
You'll get 4 .err files now, one per valid path through the maze.
We can also repeat the mod equivalence check from the KLEE paper. Create
a file called mod_test.c that looks like:
#include <klee/klee.h>

unsigned mod_opt(unsigned x, unsigned y) {
    if((y & -y) == y) // power of two?
        return x & (y-1);
    else
        return x % y;
}       

unsigned mod(unsigned x, unsigned y) {
    return x % y;
}

int main(void) {
    int x, y;
    klee_make_symbolic(&x, sizeof(x), "x");
    klee_make_symbolic(&y, sizeof(y), "y");
    klee_assert(mod(x,y) == mod_opt(x,y));
}
Now compile it and run KLEE on it:
$ clang -I klee_src/include -c -emit-llvm mod_test.c -o mod_test
<some warnings omitted>
$ klee ./mod_test.bc
KLEE: output directory is "/home/klee/./klee-out-0"
Using STP solver backend
KLEE: ERROR: (location information missing) divide by zero
KLEE: NOTE: now ignoring this error at this location

KLEE: done: total instructions = 66
KLEE: done: completed paths = 3
KLEE: done: generated tests = 3

The error we see is because x % y when y = 0 is equivalent to divison by
zero and so is reported as an error. Aside from this case, however, KLEE
finishes successfully, proving that the two functions are equivalent.
Angr

Start docker:
docker run -it angr/angr
Get and compile the maze program (you may want to include debug info, but it's not required):
wget http://panda.moyix.net/~moyix/a_maze.c
gcc -g a_maze.c -o a_maze

Next, find the address in the binary that corresponds to winning. You can do this with objdump or IDA; look for the call to puts with the "You win" string argument. For me this was 0x4008f7.
Start ipython. The remainder of the commands here are intended to be typed into the Python shell.
import angr
load_opts = {}
# 'auto_load_libs': False tells angr not to descend into library calls
load_opts['main_opts'] = {'auto_load_libs': False}

# Create a Project that will hold angr's information about our binary
proj = angr.Project('./a_maze', load_options=load_opts)

# Create a PathGroup -- an container for multiple execution paths
# through the program
pg = proj.factory.path_group()

# Now we can tell it to explore the program, trying to find a path that
# gets to some target
# This will take a few minutes...

pg.explore(find=0x4008f7)

# You should see:
# Out[6]: <PathGroup with 137 deadended, 11 active, 1 found>
# Meaning that we reached our target. Angr will put the paths into
# member variables according to their kind: pg.deadended, pg.active,
# pg.found, etc. Each is a list.

found = pg.found[0]

# Now we can examine the input that angr found by looking into its 
# symbolic model of the POSIX runtime environment. The dumps method
# will print out the contents of a symbolic file descriptor -- in our
# case we want stdin, which is file descriptor 0:

print found.state.posix.dumps(0)

# This prints "sddwddddsddw", which is the same path KLEE found.

# As a final note, pg.explore can be given a more complicated condition
# to find. You can pass it a function that takes a path and returns true
# when you've reached the desired condition. So we can rewrite our earlier
# explore statement as:

pg.explore(find=lambda p: 'You win' in p.state.posix.dumps(1))

# This will allow it to run until the string 'You win' appears on stdout.
# This will take much longer because to do that string search we will need
# to concretize standard out, which means we have to invoke the constraint
# solver at every step! However, it does eventually find a solution:
# Out[16]: <PathGroup with 137 deadended, 20 active, 1 found>
# In [17]: print pg.found[0].state.posix.dumps(0)
# sddwddddsddw