Skip to content

Instantly share code, notes, and snippets.

@njsmith
Last active October 25, 2020 04:55
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save njsmith/f90a7b3be7467a715cb0cee2c01841b8 to your computer and use it in GitHub Desktop.
Save njsmith/f90a7b3be7467a715cb0cee2c01841b8 to your computer and use it in GitHub Desktop.

Question: suppose we wanted to let users mostly deal with individual exception objects instead of ExceptionGroups, and we want to do that without losing the rich traceback metadata. (Tree structure, notes about places where exceptions crossed between tasks, ...) What would that look like?

The big challenge is that in current trio.MultiError, if you want to extract an individual exception and make it a proper standalone object, you have to mutate its __traceback__ attribute to capture the info that's otherwise only present in the MultiError object. (And this in turn is the main reason I've argued for except ValueError as exc being run once with a single ExceptionGroup: if we iterate, then we have to extract individual exceptions. If we wrap in a new ExceptionGroup, then we never have to mutate any exception objects in place.) So if we can solve this mutation problem, then it unlocks a lot of new possibilities.

Requirements would be:

  • Given an individual exception exc, you can print its traceback.

  • Given several individual exceptions exc1, exc2, ..., you can reconstruct the joint traceback tree

  • Extracting an individual exception from a group should not require mutating the exception object.

  • During exception propagation, appending frames to the traceback needs to be super cheap (ideally O(1) for an arbitrary-sized exception group). Most tracebacks are never examined at all, so it's OK if printing them requires some extra work as long as creating them is as cheap as possible.

  • Leaf exceptions don't hold strong references to other leaf exceptions (e.g. you can't do exc1.__traceback__.something.children[0] to access exc2.

Right now, tracebacks are stored as a single-linked list where the head is the newest frame:

@dataclass
class Traceback:
    tb_frame: Frame
    tb_next: Optional[Traceback]
    ...

# Used as:
#
# exc -> outermost Traceback -> Traceback -> ... -> innermost Traceback
#     ^
#      \ new Traceback objects inserted here
def push_frame(exc, frame):
    exc.__traceback__ = Traceback(tb_frame=frame, tb_next=exc.__traceback)

This is a super elegant representation and makes pushing new frames onto the traceback cheap, but it can't efficiently represent traceback trees, where a single "outer" frame may have multiple "inner" frames. If you wanted to use this representation while propagating N exceptions, you'd need to create N duplicate Traceback objects for each frame:

# exc1 -> outermost Traceback -> ... -> innermost Traceback
# exc2 -> outermost Traceback -> ... -> innermost Traceback
# ...
def push_frame(excs, frame):
    for exc in excs:
        exc.__traceback__ = Traceback(tb_frame=frame, tb_next=exc.__traceback__)

This is expensive, and also makes it hard to reconstruct the tree, because we can't tell which duplicated Traceback objects are supposed to represent the "same" point in the tree.

OTOH, what if we switch it around, and make tracebacks a singly-linked list from inner->outer?

@dataclass
class Traceback:
    tb_frame: Frame
    tb_outer: Optional[Traceback]
    ...

@dataclass
class BaseException:
    # Head pointer, used to access the full traceback
    __traceback_innermost__: Optional[Traceback]
    # Tail pointer, only used for pushing new frames onto the traceback
    __traceback_outermost__: Optional[Traceback]

    @property
    def __traceback__(self) -> Optional[Traceback]:
        # reconstruct old-style traceback for backwards compat
        ...
    ...

# Used as:
#
# exc -> innermost Traceback -> Traceback -> ... -> outermost Traceback
def push_frame(exc, frame):
    new_tb = Traceback(tb_frame=frame, tb_outer=None)
    exc.__traceback_outermost__.tb_outer = new_tb
    exc.__traceback_outermost__ = new_tb

Now when raising multiple exceptions at once, we can be clever:

# Pseudocode for the implementation of 'raise *(exc1, exc2, exc3)'.
#
# Sets up a tree structure like:
#
# exc1 -> innermost Traceback -> ... -> Traceback +
#                                                  \
# exc2 -> innermost Traceback -> ... -> Traceback ---> new outermost Traceback
#                                                  /
# exc3 -> innermost Traceback -> ... -> Traceback +
def multi_raise(excs, frame):
    join_tb = Traceback(tb_frame=frame, tb_outer=None)
    for exc in excs:
        exc.__traceback_outermost__.tb_outer = join_tb
        exc.__traceback_outermost__ = join_tb
    wrapper = ExceptionGroup(excs)
    wrapper.__traceback_outermost__ = join_tb
    tstate->cur_exc = wrapper

Now as wrapper propagates outwards, running push_frame on it will automatically mutate all of the contained exceptions, in O(1) time. And, given a set of exception objects, we can easily reconstruct the full traceback tree by checking their __traceback__ chains for duplicate Traceback objects.

I think we'd also want to allow frameworks to attach extra metadata to Traceback objects to note where the exception passed between tasks etc., but that's straightforward.

Thoughts

Now that I've written this out, it seems pretty attractive! Our previous designs were based around the idea that you can't change the interpreter, so we were stuck with the existing traceback representation, and the existing hard-coded push-frame-to-traceback code. That's why we needed the ExceptionGroup objects to live in a tree, so each ExceptionGroup node could one linear segment of traceback objects, with the ExceptionGroup objects acting as the join nodes. (And secondarily, to act as a place to attach extra metadata about the joins.) If we can change the traceback representation, then a lot of these contortions become unnecessary.

@gvanrossum
Copy link

I think my own solution to the API change was that if you don't understand traceback groups it will just appear as if the traceback ends at that point. Tracebacks end unexpectedly all the time so I think that would be acceptable. (Logging frameworks would of course have to learn about the groups but that needs to happen anyway.)

@gvanrossum
Copy link

See python/exceptiongroups#3. Irit's code shows how to implement traceback groups. This feels more attractive than reversing the linked list. Though we have to think more about the desire to not mutate exceptions (noting that every time an exception is raised or caught it is modified by the C code anyway to update its __traceback__ attribute).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment