Create a gist now

Instantly share code, notes, and snippets.

How to recover lost Python source code if it's still resident in-memory

How to recover lost Python source code if it's still resident in-memory

I screwed up using git ("git checkout --" on the wrong file) and managed to delete the code I had just written... but it was still running in a process in a docker container. Here's how I got it back, using https://pypi.python.org/pypi/pyrasite/ and https://pypi.python.org/pypi/uncompyle6

Attach a shell to the docker container

Install GDB (needed by pyrasite)

apt-get update && apt-get install gdb

Install pyrasite - this will let you attach a Python shell to the still-running process

pip install pyrasite

Install uncompyle6, which will let you get Python source code back from in-memory code objects

pip install uncompyle6

Find the PID of the process that is still running

ps aux | grep python

Attach an interactive prompt using pyrasite

pyrasite-shell <PID>

Now you're in an interactive prompt! Import the code you need to recover

>>> from my_package import my_module

Figure out which functions and classes you need to recover

>>> dir(my_module)
['MyClass', 'my_function']

Decompile the function into source code

>>> import uncompyle6
>>> import sys
>>> uncompyle6.main.uncompyle(
    2.7, my_module.my_function.func_code, sys.stdout
)
# uncompyle6 version 2.9.10
# Python bytecode 2.7
# Decompiled from: Python 2.7.12 (default, Nov 19 2016, 06:48:10) 
# [GCC 5.4.0 20160609]
# Embedded file name: /srv/my_package/my_module.py
function_body = "appears here"

For the class, you'll need to decompile each method in turn

>>> uncompyle6.main.uncompyle(
    2.7, my_module.MyClass.my_method.im_func.func_code, sys.stdout
)
# uncompyle6 version 2.9.10
# Python bytecode 2.7
# Decompiled from: Python 2.7.12 (default, Nov 19 2016, 06:48:10) 
# [GCC 5.4.0 20160609]
# Embedded file name: /srv/my_package/my_module.py
class_method_body = "appears here"
@steven-cutting

I'll have to try this out just for the heck of it.

Pretty. Darn. Cool.

@sksq9
sksq9 commented Mar 11, 2017

Woah ! Awesome.

@raplin
raplin commented Mar 11, 2017

Wow sweet had no idea you could attach a py shell to a running python process. Super handy! thx

@DJBnjack

Nice tinkering!

Was executing a bash script inside the running docker container and just accessing the python script there not possible? Or was this overwritten?

@NickSB2000

Excellent, this has the potential to eliminate a swear and/or impress a colleague.. :-)

@Neko-Design

Awesome! Given the stupid number of times I've done exactly this im sure I'll get a chance to try it in anger soon enough

@i336
i336 commented Mar 12, 2017 edited

FYI, this feels incredibly complicated. Here's a much simpler method that universally applies to any process and will probably recover the original source, or very close to it - for example I used this approach to recover some text from a textbox in Chrome when an undo operation went awry recently. Using Python as an example:

$ python
>>> x = "QqWwEeRrTtYy"

(Leave that running, then...)

$ gdb -p $(pidof python)
...
0xb7414b08 in ___newselect_nocancel () from /lib/libc.so.6
(gdb) generate-core-file pythontest.dump
Saved corefile pythontest.dump
(gdb) quit
A debugging session is active.

        Inferior 1 [process 14970] will be detached.

Quit anyway? (y or n) y
Detaching from program: /usr/bin/python2.7, process 14970
$ grep -o QqWw pythontest.dump 
Binary file pythontest.dump matches
$ grep -ao QqWw pythontest.dump 
QqWw
QqWw
QqWw
QqWw
QqWw
QqWw
bash-4.3$ grep -a QqWw pythontest.dump 
...libxml2.ph....   >>>  = "QqWwEeRrTtYy   >> x = "QqWwEeRrTtYy"ntel        st-0x = "QqWwEeRrTtYy"
(...)
 = "QqWwEeRrTtYy" ··¸$.·xtermi336ÀÛr·åÿÿÿÿ Return a wrapped version of file which provides transparent
ÀÛr·ÿÿÿÿencodings.latin_1ÀÛr·3AÄencodings.latin_1É*·þÿÿÿ`är·\·\·L·þÿÿÿ`är·¬Ì(··H·è·ýÿÿÿ`är·dÙ·,з ÷r·1· ·ýÿÿÿ parse_and_bindacheÀÛr·WIoDread_history_fileÀÛr·ÓcVûwrite_history_fileÀÛrÉ3Ïget_completerÀÛr·73>get_completion_typeÀÛr·vÁÄremove_history_itemÀÛr·0Q¦set_startup_hookÀÛr·
.Öclear_historyÀÛr·Åù_READLINE_VERSION@·ÀÛr·ÿÿÿÿeRrTtYy"ÀÛr·
                                                            @Q£QqWwEeRrTtYyTtYy"
òlS·àSw·x = "QqWwEeRrTtYy"
$ 64;1;2;6;9;15;18;21;22c^C
$ ^C

Left in some of the binary asplosion for fun; this is a Unicode world now after all, it shouldn't cause any issues. As you can see, some of the data (a ridiculously small amount here) is mangled, but I see at least three intact copies of my original text. YMMV depending on what malloc implementation your app is using and how much fragmentation happened.

Here's one to file away if you frequently use Linux:

configure enough swapspace on your system, then in an absolute emergency open a terminal and run sync then echo disk > /sys/power/state or pm-hibernate to trigger system hibernation. Of course, this process requires a full copy of memory to be written to the disk... :) reboot your system off a flash drive for best results analysing the disk. WARNING: It feels horribly unintuitive but you must sync your disk before hibernating unless you know you'll be able to successfully resume off of the hibernated memory image, because of course hibernating means that whatever the filesystem was doing is immediately abandoned in-flight, with the idea that it will be finished when the system wakes back up! If you never resume, that in-memory filesystem data never makes it to disk. Ideally you'd copy the memory image somewhere then resume from the hibernated image; it might be worth figuring out how to do that on your system.

And of course this is all because Linux doesn't provide arbitrary access to memory. Kinda crazy that it's not generally possible, but it's understandable.

@ancat
ancat commented Mar 12, 2017 edited

@i336 what you're likely seeing is the buffer of the interactive shell history. Testing python <file> on a file that gets deleted yields "random" code fragments (or the entirety, for very tiny programs) here and there but not the entire source. I used gdb to search across the entirety of memory space and couldn't recover the source code for any programs larger than a few lines.

@tleeuwenburg

For what it's worth, I did something similar recently with git and went a different path to recovery based on 'git fsck' and retrieving the files from hashed objects stored in git. Kudos to your fantastic recovery strategy though!

@odino
odino commented Mar 24, 2017

What about docker cp? :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment