More than a year ago, I wrote a blog post titled Context Should Go Away For Go 2 which received a fair amount of support and response. In said blog post, I described reasons why the "context"
package is a bad idea because it's too infectious.
As explained in the blog post, the reason why "context"
spreads so much and in such an unhealthy fashion is because it solves the problem of cancellation of long-running procedures.
I promised to follow the blog post (which only complained about the problem) with a solution. Considering the recent progress around Go 2, I decided it's the right time to do the follow up now. So, here it is!
My proposed solution is to bake cancellation into the language and thus avoiding the need to pass the context around just to be able to cancel long-running procedures. The "context"
package could still be kept for the purpose of goroutine-local data, however, this purpose does not cause it to spread, so that's fine.
In the following sections I'll explain how exactly the baked-in cancellation would work.
One quick point before we start: this proposal does not make it possible to "kill" a goroutine - the cancellation is always cooperative.
I'll explain the proposal in a series of short, very contrived examples.
We start a goroutine:
go longRunningThing()
In Go 1, the go
keyword is used to start a goroutine, but doesn't return anything. I propose it should return a function which when called, cancels the spawned goroutine.
cancel := go longRunningThing()
cancel()
We started a goroutine and then cancelled it immediately.
Now, as I've said, cancellation must be a cooperative operation. The longRunningThing
function needs to realize its own cancellation on request. How could it look like?
func longRunningThing() {
select {
case <-time.After(5 * time.Second):
fmt.Println("finished")
}
}
This longRunningThing
function does not cooperate. It takes 5 seconds no matter what. Here's how we can improve it:
func longRunningThing() {
select {
case <-time.After(5 * time.Second):
fmt.Println("finished")
cancelling:
fmt.Println("cancelled")
}
}
I propose the select
statement gets an additional branch called cancelling
(a new keyword) which gets triggered whenever the goroutine is scheduled for cancellation, i.e. when the function returned from the go
statement gets called.
The above program would therefore print:
cancelled
What if the long-running thing spawns some goroutines itself? Does it have to handle their cancellation explicitly? No, it doesn't. All goroutines spawned inside a cancelled goroutine get cancelled first and the originally cancelled goroutine starts its cancellation only after all its 'child' goroutines finish.
For example:
func longRunningThing() {
go anotherLongRunningThing()
select {
case <-time.After(5 * time.Second):
fmt.Println("finished")
cancelling:
fmt.Println("cancelled")
}
}
func anotherLongRunningThing() {
select {
case <-time.After(3 * time.Second):
fmt.Println("child finished")
cancelling:
fmt.Println("child cancelled")
}
}
This time, running:
cancel := go longRunningThing()
cancel()
prints out:
child cancelled
cancelled
This features is here because the child goroutines usually communicate with the parent goroutine. It's good for the parent goroutine to stay fully intact until the child goroutines finish.
Now, let's say, that instead of in another goroutine, longRunningThing
needs to execute anotherLongRunningThing
three times sequentially, like this (anotherLongRunningThing
remains unchanged):
func longRunningThing() {
anotherLongRunningThing()
anotherLongRunningThing()
anotherLongRunningThing()
}
This time, longRunningThing
doesn't even handle the cancellation at all. But, cancellation propagates to all nested calls. Cancelling this longRunningThing
would print:
child cancelled
child cancelled
child cancelled
All anotherLongRunningThing
calls got cancelled one by one.
What if anotherLongRunningThing
can fail, or just wants to signal it was cancelled instead of finishing successfully? We can make it return an error:
func anotherLongRunningThing() error {
select {
case <-time.After(3 * time.Second):
return nil
cancelling:
return errors.New("cancelled")
}
}
Now we update the longRunningThing
to handle the error (using the new error handling proposal):
func longRunningThing() error {
check anotherLongRunningThing()
check anotherLongRunningThing()
check anotherLongRunningThing()
return nil
}
In this version, longRunningThing
returns the first error it encounters while executing anotherLongRunningThing
three times sequentially. But how do we receive the error? We spawned the function in a goroutine and there's no way to get the return value of a goroutine in Go 1.
Here comes the last thing I propose. I propose that the function returned from the go
statement has the same return values as the function that was set to run in the goroutine. So, in our case, the cancel
function has type func() error
:
cancel := go longRunningThing()
err := cancel()
fmt.Println(err)
This prints:
cancelled
However, if we waited 10 seconds before cancelling the goroutine (longRunningThing
takes 9 seconds), we'd get no error, because the function finished successfully:
cancel := go longRunningThing()
time.Sleep(10 * time.Second)
err := cancel()
fmt.Println(err)
Prints out:
<nil>
And lastly, say we have a function called getGoods
which contacts some service, gets some goods back, and sends them on a channel. We only want to wait for the goods for 5 seconds, no more. Here's how we implement a timeout:
goods := make(chan Good)
cancel := go getGoods(goods)
select {
case x := <-goods:
// enjoy the goods
case <-time.After(5 * time.Second):
err := cancel()
return errors.Wrap(err, "timed out")
}
And that is the end of this series of short examples. I've shown all of the proposed features. In the next section, I'll describe the features more carefully and explain precisely how they'd work.