mrmichalis/adventures-in-python-core-dumping.md

## adventures-in-python-core-dumping.md

      
    Raw
  

              adventures-in-python-core-dumping.md
            
          
    Adventures in Python Core Dumping

After watching Bryan Cantrill's presentation on
Running Aground: Debugging Docker in Production I got all
excited (and strangely nostalgic) about the possibility of
core-dumping server-side Python apps whenever they go awry. This would
theoretically allow me to fully inspect the state of the program
at the point it exploded, rather than relying solely on the information
of a stack trace.
I decided to try exploring a core dump on my own by writing a simple
Python script that generated one.
Initial Setup

Doing this required a bit of setup on my Ubuntu 14.04 server.
First, I had to apt-get install python2.7-dbg to install a version
of Python with debug symbols, so that gdb could actually
make sense of the core dump. It seems Ubuntu comes pre-configured with
a Python debugging extension for gdb built-in, so I didn't have to
do any extra configuration here, which was great.
I also had to add the following line to /etc/security/limits.conf to
actually enable core dump files to be created:

#<domain>       <type>  <item>          <value>
*               soft    core            100000
After that, I created a file called explode.py in my home directory:
import os

def my_exploding_func():
    my_local_var = 'hi'
    os.abort()

my_exploding_func()
Then I ran the script:
$ python2.7-dbg explode.py
Aborted (core dumped)
This created a core file in my home directory.
Exploring The Stack

I opened the core dump in gdb:
$ gdb /usr/bin/python2.7-dbg core
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
...

warning: core file may not match specified executable file.
[New LWP 10020]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/python2.7-dbg ./explode.py'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f996aff7cc9 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56      ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb)
Now I could use all of gdb's
Python debugging extension commands. For example, running
py-bt gave me:
(gdb) py-bt
#4 Frame 0x7f996bf28240, for file ./explode.py, line 7, in my_exploding_func (my_local_var='hi')
    os.abort()
#7 Frame 0x7f996bf28060, for file ./explode.py, line 9, in <module> ()
    my_exploding_func()
I could also use py-locals to show me the values of local variables
in the current stack frame, and py-up and py-down to traverse the
stack.
This was all pretty awesome, and will be very useful if my Python programs
actually segfault. But it'd be cool if I could actually get all this rich
information any time one of my servers returned a 500. That's a bit
of a different situation since Python servers don't usually segfault
when they return a 500--instead, they catch exceptions, return an
error code, and continue running.
For now I'm going to ignore the "continue running" part; there are
ways to core dump without killing a process, but right
now I'm more interested in figuring out how to get information about
handled exceptions.
Obtaining Information About Handled Exceptions

Let's assume we have a script called explode2.py:
import os

def my_exploding_func():
    a = 1
    call_nonexistent_func()

try:
    my_exploding_func()
except Exception, e:
    os.abort()
The thing about the core dump generated from this script is that running
py-bt only gives us the stack trace from the point that we called
os.abort(), which is pretty useless:
(gdb) py-bt
#4 Frame 0x7f3767430450, for file ./explode3.py, line 12, in <module> ()
    os.abort()
What we really want is a way to introspect the exception that was currently
being handled at the time that os.abort() was called.
There isn't a particularly easy way to do this with the Python debugging
extension for gdb, but one nice thing about gdb is that its extensions
are written in Python. This means we can write our own extension
that gives us easy access to the information we need.
Doing this took some research. It looks like the latest version of the
Python debugging extension for gdb is in a file in the CPython codebase
called libpython.py, but this is actually a much newer
version than the one that ships with Ubuntu 14.04. I had to use strace
to find the actual version on my system, which was at
/usr/lib/debug/usr/bin/python2.7-dbg-gdb.py.
After poring through the code and consulting the CPython source code and
documentation on extending gdb using Python, I wrote my first
gdb extension, which is in the attached
py_exc_print.py file. It adds a py-exc-print
command that gives us what we need:
(gdb) source py_exc_print.py
(gdb) py-exc-print
Traceback (most recent call last):
  Frame 0x7f3767430450, for file ./explode2.py, line 12, in <module> ()
  Frame 0x7f37673f3060, for file ./explode2.py, line 7, in my_exploding_func (a=1)
exceptions.NameError("global name 'call_nonexistent_func' is not defined",)
Note that it's more useful than a standard stack trace, as the values of
local variables are included in the printout. But more work on the
extension needs to be done in order to make those locals easily
introspectable.
Conclusion

Thus concludes my first foray into Python core dumping.
Some open questions:


I'm not sure how feasible core dumping on every uncaught exception
actually is. For instance, how big do core files become in production
environments?


Are there privacy risks involved in core dumping? Depending on the
retention policy, it essentially means that data in use could
inadvertently become data at rest.


In order for the core dump to be useful, a debug build of the Python
interpreter needs to be used. How is performance impacted by this?
As the aforementioned Bryan Cantrill talk mentions, we should be
able to inspect core dumps from production environments: yet is it
feasible to run a debug build of Python in a production environment?


## py_exc_print.py
# Note that when we're loaded into gdb via `source py_exc_print.py`, we
# seem to be loaded into the same namespace as the Python debugging
# extension, which is some version of the following file by David Malcolm:
#
# https://hg.python.org/cpython/file/2.7/Tools/gdb/libpython.py

def pm_sys_exc_info():
    '''Just like sys.exc_info(), but post-mortem!'''

    # The _PyThreadState_Current global is defined in:
    # https://hg.python.org/cpython/file/tip/Python/pystate.c
    val = gdb.lookup_symbol('_PyThreadState_Current')[0].value()

    # The PyThreadState type is defined in:
    # https://hg.python.org/cpython/file/tip/Include/pystate.h
    return [PyTracebackObjectPtr.from_pyobject_ptr(val[name])
            for name in ['exc_type', 'exc_value', 'exc_traceback']]

def pm_traceback_print_exc():
    '''Kinda like traceback.print_exc(), but post-mortem, and no args!'''

    exc_type, exc_value, exc_traceback = pm_sys_exc_info()

    sys.stdout.write('Traceback (most recent call last):\n')

    while not exc_traceback.is_null():
        frame = exc_traceback.get_frame()
        sys.stdout.write('  %s\n' % frame.get_truncated_repr(MAX_OUTPUT_LEN))
        exc_traceback = exc_traceback.get_next()

    exc_value.write_repr(sys.stdout, set())
    sys.stdout.write('\n')

class PyTracebackObjectPtr(PyObjectPtr):
    '''
    Class wrapping a gdb.Value that's a (PyTracebackObject*) within the
    inferior process.
    '''

    # PyTracebackObject is defined in:
    # https://hg.python.org/cpython/file/tip/Include/traceback.h
    _typename = 'PyTracebackObject'

    def __init__(self, gdbval, cast_to=None):
        PyObjectPtr.__init__(self, gdbval, cast_to)
        self._py_tb_obj = gdbval.cast(self.get_gdb_type()).dereference()

    def _get_struct_elem(self, name):
        return self.__class__.from_pyobject_ptr(self._py_tb_obj[name])

    def get_frame(self):
        return self._get_struct_elem('tb_frame')

    def get_next(self):
        return self._get_struct_elem('tb_next')

    @classmethod
    def subclass_from_type(cls, t):
        '''
        This is called from the from_pyobject_ptr class method we've
        inherited. We override its default implementation to be
        aware of traceback objects.
        '''

        try:
            tp_name = t.field('tp_name').string()
            if tp_name == 'traceback':
                return PyTracebackObjectPtr
        except RuntimeError:
            pass

        return PyObjectPtr.subclass_from_type(t)

class PyExcPrint(gdb.Command):
    '''
    Display a (sort of) Python-style traceback of the exception currently
    being handled.
    '''

    def __init__(self):
        gdb.Command.__init__(self, 'py-exc-print', gdb.COMMAND_STACK,
                             gdb.COMPLETE_NONE)

    def invoke(self, args, from_tty):
        pm_traceback_print_exc()

PyExcPrint()
	# Note that when we're loaded into gdb via `source py_exc_print.py`, we
	# seem to be loaded into the same namespace as the Python debugging
	# extension, which is some version of the following file by David Malcolm:
	#
	# https://hg.python.org/cpython/file/2.7/Tools/gdb/libpython.py

	def pm_sys_exc_info():
	'''Just like sys.exc_info(), but post-mortem!'''

	# The _PyThreadState_Current global is defined in:
	# https://hg.python.org/cpython/file/tip/Python/pystate.c
	val = gdb.lookup_symbol('_PyThreadState_Current')[0].value()

	# The PyThreadState type is defined in:
	# https://hg.python.org/cpython/file/tip/Include/pystate.h
	return [PyTracebackObjectPtr.from_pyobject_ptr(val[name])
	for name in ['exc_type', 'exc_value', 'exc_traceback']]

	def pm_traceback_print_exc():
	'''Kinda like traceback.print_exc(), but post-mortem, and no args!'''

	exc_type, exc_value, exc_traceback = pm_sys_exc_info()

	sys.stdout.write('Traceback (most recent call last):\n')

	while not exc_traceback.is_null():
	frame = exc_traceback.get_frame()
	sys.stdout.write(' %s\n' % frame.get_truncated_repr(MAX_OUTPUT_LEN))
	exc_traceback = exc_traceback.get_next()

	exc_value.write_repr(sys.stdout, set())
	sys.stdout.write('\n')

	class PyTracebackObjectPtr(PyObjectPtr):
	'''
	Class wrapping a gdb.Value that's a (PyTracebackObject*) within the
	inferior process.
	'''

	# PyTracebackObject is defined in:
	# https://hg.python.org/cpython/file/tip/Include/traceback.h
	_typename = 'PyTracebackObject'

	def __init__(self, gdbval, cast_to=None):
	PyObjectPtr.__init__(self, gdbval, cast_to)
	self._py_tb_obj = gdbval.cast(self.get_gdb_type()).dereference()

	def _get_struct_elem(self, name):
	return self.__class__.from_pyobject_ptr(self._py_tb_obj[name])

	def get_frame(self):
	return self._get_struct_elem('tb_frame')

	def get_next(self):
	return self._get_struct_elem('tb_next')

	@classmethod
	def subclass_from_type(cls, t):
	'''
	This is called from the from_pyobject_ptr class method we've
	inherited. We override its default implementation to be
	aware of traceback objects.
	'''

	try:
	tp_name = t.field('tp_name').string()
	if tp_name == 'traceback':
	return PyTracebackObjectPtr
	except RuntimeError:
	pass

	return PyObjectPtr.subclass_from_type(t)

	class PyExcPrint(gdb.Command):
	'''
	Display a (sort of) Python-style traceback of the exception currently
	being handled.
	'''

	def __init__(self):
	gdb.Command.__init__(self, 'py-exc-print', gdb.COMMAND_STACK,
	gdb.COMPLETE_NONE)

	def invoke(self, args, from_tty):
	pm_traceback_print_exc()

	PyExcPrint()