rogpeppe/error-inspection-feedback.md

## error-inspection-feedback.md

      
    Raw
  

              error-inspection-feedback.md
            
          
    Some thoughts about the draft Go error inspection design


Some error packages intend for programs to act on a single error (the
“Cause”) extracted from the chain of wrapped errors. We feel that a
single error is too limited a view into the error chain.

It might not feel that way initially, but I believe that a single error is
almost as expressive as iterating down the stack looking for an error that
fits some criteria, and considerably easier to reason about.

More than one error might be worth examining. The errors.As function
can select any error from the chain; two calls with different types can
return two different errors.
For instance, a program could both ask whether an error is a PathError
and also ask whether it is a permission error.

Motivational examples are particularly important here, and I'm not convinced
this is a good example. Even with a single cause, one
can still look at it and check whether it's either a PathError with a
wrapped permission error, or an unwrapped permission error.
Let's think for a moment about what the error chain actually represents.
Every error in the chain represents somewhere that the code decided
to add some context, usually just before returning from a function.
Functions are a important unit of encapsulation - a function hides
the details of how it works. So when we're traversing the error chain
looking for an error that matches a particular type or value, we are
diving through those layers of encapsulation.
I believe that the set of possible errors returned by a function should
be considered as much a part of a function's public contract as its
set of possible argument values or the set of possible visible side
effects that it might have. If you are traversing the error stack to
find an error that is not documented as part of the API, you are opening up
your code to breakage. The author of the function is entitled to change
the details of how it works, but if callers are dependent on specific
errors returned from deep down, then changing how it works can break
the program. In other words, this is inappropriate information leakage.
For example, say I'm using some code that provides some abstraction
on top of a local database.
type Store struct {
	// contains filtered or unexported fields
}
func (s *Store) ChangeSomething(args ChangeParams) error

My code calls Store.ChangeSomething, but I find that in some
circumstances I'm seeing that the system is restarting because of an error
when there's a duplicate entry in the database.  I want to be resilient in this case,
so I look down the error chain and find that it's returning an error of
type sqlite3.Error with a Code value of sqlite3.ErrConstraint.
I write some code like this:
if err, ok := errors.As(*sqlite3.Error)(err); ok && err.Code == sqlite3.ErrConstraint {
	// Duplicate entry - not fatal.
	return nil
}
return err

Everything is now working. My system is no longer restarting 20 times a minute
due to an avoidable error.
One day however, I update the store implementation, and it turns out that
its underlying data store has changed to use BoltDB instead.  The Store
code still has exactly the same external API, but when the duplicate
entry error occurs it now returns boltdb.ErrBucketExists. My code is
now broken.
Because the As and Is operators explicitly traverse the error chain,
they are inherently breaking abstraction boundaries and invite that
kind of implicit dependency between different layers that can lead to
breakage like the above.
I believe that the answer to this problem is to make each function
responsible for the set of possible errors that it can return.  When
a function  returns an error that doesn't fit within that set, it should hide
it so that callers cannot depend on it without making it very clear that
they're doing so.
In the above Store.ChangeSomething example, that would mean that instead
of changing the code to rely on an error I've just discovered by inspection,
I'd need to contact the authors of the store package and persuade them to add an error to
the API, or perhaps explicitly pass through sqlite3 errors (in which case
it would be clear that moving to BoltDB would be an API-breaking change).
Maintaining this set of possible errors is made harder when it's actually a
set of sets of possible errors. In the error proposal, every time we
call errors.Is or errors.As, we're essentially saying that a given
error value might be several things at the same time, because all the errors
in the chain are being considered. If we want to preserve a set of
errors, that implies checking whether any one of the errors in the chain
is in the set of errors that we care about.
By contrast, if we say that an error has a single "Cause" (however you
want to spell that), then life is simpler. We can just check if that
single value fits within the set and hide the cause otherwise.
The errgo package does this by
expecting clients to provide a function of the form func(error) bool
that reports whether a given error is part of that set. This means that
it's possible to use arbitrary criteria to include or exclude errors.
There are already functions in the standard library that fit this paradigm,
for example os.IsNotExist.
Another reason to have a single cause is efficiency. With large programs
come deep call stacks, and hence deeply wrapped errors. If we're
checking every level of the call stack at level of the call stack, then
we have O(n²) complexity. It's entirely conceivably that there are
many errors being returned and one unusual error that's being
checked for all the time - in this case, we'll almost always traverse
the entire chain before deciding that the error isn't there.
Single cause

Following the argument in the previous section, I propose that
the errors package should support only a single "cause" for any error.
Please don't get hung up on the word "cause" - I am aware
that there are many possible spellings, and many strong preferences
for what word to use here. The important thing is that there should
be a way to get a single value from an error that code can use
to guide future actions.
This does not affect the reasoning that led to the proposal that
splits the error chain to print from the error chain used for error control flow.
type Unwrapper interface {
	Unwrap() error
}

type Causer interface {
	Cause() error
}