Skip to content

Instantly share code, notes, and snippets.

@rogpeppe
Last active November 14, 2018 10:16
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rogpeppe/a435b57473152e3429a5a149401edacf to your computer and use it in GitHub Desktop.
Save rogpeppe/a435b57473152e3429a5a149401edacf to your computer and use it in GitHub Desktop.

Some thoughts about the draft Go error inspection design

Some error packages intend for programs to act on a single error (the “Cause”) extracted from the chain of wrapped errors. We feel that a single error is too limited a view into the error chain.

It might not feel that way initially, but I believe that a single error is almost as expressive as iterating down the stack looking for an error that fits some criteria, and considerably easier to reason about.

More than one error might be worth examining. The errors.As function can select any error from the chain; two calls with different types can return two different errors.

For instance, a program could both ask whether an error is a PathError and also ask whether it is a permission error.

Motivational examples are particularly important here, and I'm not convinced this is a good example. Even with a single cause, one can still look at it and check whether it's either a PathError with a wrapped permission error, or an unwrapped permission error.

Let's think for a moment about what the error chain actually represents. Every error in the chain represents somewhere that the code decided to add some context, usually just before returning from a function. Functions are a important unit of encapsulation - a function hides the details of how it works. So when we're traversing the error chain looking for an error that matches a particular type or value, we are diving through those layers of encapsulation.

I believe that the set of possible errors returned by a function should be considered as much a part of a function's public contract as its set of possible argument values or the set of possible visible side effects that it might have. If you are traversing the error stack to find an error that is not documented as part of the API, you are opening up your code to breakage. The author of the function is entitled to change the details of how it works, but if callers are dependent on specific errors returned from deep down, then changing how it works can break the program. In other words, this is inappropriate information leakage.

For example, say I'm using some code that provides some abstraction on top of a local database.

type Store struct {
	// contains filtered or unexported fields
}
func (s *Store) ChangeSomething(args ChangeParams) error

My code calls Store.ChangeSomething, but I find that in some circumstances I'm seeing that the system is restarting because of an error when there's a duplicate entry in the database. I want to be resilient in this case, so I look down the error chain and find that it's returning an error of type sqlite3.Error with a Code value of sqlite3.ErrConstraint.

I write some code like this:

if err, ok := errors.As(*sqlite3.Error)(err); ok && err.Code == sqlite3.ErrConstraint {
	// Duplicate entry - not fatal.
	return nil
}
return err

Everything is now working. My system is no longer restarting 20 times a minute due to an avoidable error.

One day however, I update the store implementation, and it turns out that its underlying data store has changed to use BoltDB instead. The Store code still has exactly the same external API, but when the duplicate entry error occurs it now returns boltdb.ErrBucketExists. My code is now broken.

Because the As and Is operators explicitly traverse the error chain, they are inherently breaking abstraction boundaries and invite that kind of implicit dependency between different layers that can lead to breakage like the above.

I believe that the answer to this problem is to make each function responsible for the set of possible errors that it can return. When a function returns an error that doesn't fit within that set, it should hide it so that callers cannot depend on it without making it very clear that they're doing so.

In the above Store.ChangeSomething example, that would mean that instead of changing the code to rely on an error I've just discovered by inspection, I'd need to contact the authors of the store package and persuade them to add an error to the API, or perhaps explicitly pass through sqlite3 errors (in which case it would be clear that moving to BoltDB would be an API-breaking change).

Maintaining this set of possible errors is made harder when it's actually a set of sets of possible errors. In the error proposal, every time we call errors.Is or errors.As, we're essentially saying that a given error value might be several things at the same time, because all the errors in the chain are being considered. If we want to preserve a set of errors, that implies checking whether any one of the errors in the chain is in the set of errors that we care about.

By contrast, if we say that an error has a single "Cause" (however you want to spell that), then life is simpler. We can just check if that single value fits within the set and hide the cause otherwise.

The errgo package does this by expecting clients to provide a function of the form func(error) bool that reports whether a given error is part of that set. This means that it's possible to use arbitrary criteria to include or exclude errors.

There are already functions in the standard library that fit this paradigm, for example os.IsNotExist.

Another reason to have a single cause is efficiency. With large programs come deep call stacks, and hence deeply wrapped errors. If we're checking every level of the call stack at level of the call stack, then we have O(n²) complexity. It's entirely conceivably that there are many errors being returned and one unusual error that's being checked for all the time - in this case, we'll almost always traverse the entire chain before deciding that the error isn't there.

Single cause

Following the argument in the previous section, I propose that the errors package should support only a single "cause" for any error. Please don't get hung up on the word "cause" - I am aware that there are many possible spellings, and many strong preferences for what word to use here. The important thing is that there should be a way to get a single value from an error that code can use to guide future actions.

This does not affect the reasoning that led to the proposal that splits the error chain to print from the error chain used for error control flow.

type Unwrapper interface {
	Unwrap() error
}

type Causer interface {
	Cause() error
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment