This is a technique for extracting all imported modules from a packaged Python application as
.pyc files, then decompiling them. The target program needs to be run from scratch, but no debugging symbols are necessary (assuming an unmodified build of Python is being used).
This was performed on 64-bit Linux with a Python 3.6 target.
In Python we can leverage the fact that any module import involving a
.py* file will eventually arrive as ready-to-execute Python code object at this function:
PyObject* PyEval_EvalCode(PyObject *co, PyObject *globals, PyObject *locals);
If a breakpoint is set here in gdb, the C implementation for
marshal.dump() can be called to dump the bytecode to file. Conveniently the
.pyc format is simply a marshaled
PyCodeObject with a small header.
marshal-to-pyc.py below can be used to convert these raw marshaled code objects into .pyc files and decompile them if desired.
Implementation in GDB
Start the debugger in a stopped state:
Then in the GDB console:
# Wait for the Python library to load if the symbol can't be found before runtime catch load # Run the program run # Continue until gdb breaks where the target Python .so is loading continue # ... # Break on the target function break PyEval_EvalCode
Now GDB can be automated to dump every
PyCodeObject evaluated at runtime to disk. You may want to test and validate a single dump manually before proceeding with the
command automated version.
# Index for writing multiple files set $index = 0 # Define code dumping command (no symbols available) # Passing $rdi here is equivalent to passing the `co` argument when debugging symbol are present define dump_pyc eval "set $handle = fopen(\"%s/%d.marshal\", \"w\")", $arg0, $index call (void) PyMarshal_WriteObjectToFile($rdi, $handle, 4) call fclose($handle) set $index += 1 end command dump_pyc "/tmp/" continue end
The first argument of
PyEval_EvalCode should be in the
rdi register on x86_64 Linux, but it may differ on your platform. You may need to find the location of the first argument yourself, but once you know the location it can be substituted above.