A couple of weeks ago I listened to .Net Rocks! #1433 about Visual Studio 17 with Kathleen Dollard. A very good episode interest but with stuck with me was Kathleen's comment that after all these years she sees very smart people still struggle with async. This is my and others experience as well.
A few years ago I wrote a blog post on why I thought async in C# isn't that great. I took it down a bit later even though it was popular it was also divisive and my understanding of async had grown which made me want to rewrite the whole piece into something better and less divisive.
I used to think that the implicit coupling in async to the SynchronizationContext
is a major problem for async but today I think that it's a minor problem for most developers. The reason is that if you are writing ASP.NET you will always have the ASP.NET SynchronizationContext
and that gives particular but consistent async behavior. If you are writing a console you will always have the default SynchronizationContext
that gives you a different but consistent async behavior. Only library developers will have to write SynchronizationContext
agnostic code which is why one often see task.ConfigureAwait(false)
in libraries.
Today I think the major problem comes from a lack of understanding that there's no safe way for a function to run async synchronously. This is made worse because the async Task
exposes unsafe methods like Task.Result
.
async is another word for coroutine, a term that was coined in 1958 by Donald Knuth and Melvin Conway. Simula (1960s) and Modula-2 (1980s) were early examples of languages with support for coroutines but async popularized coroutines as it was added to a mainstream language, C#, and the value add of coroutines is more relevant to developers in 2000s than in 1960s.
It was Donald Knuth that said that "Subroutines are special cases of ... coroutines".
Subroutines also known as methods or functions are one of our fundamental software building blocks. We invoke a subroutine in order to do some work and it will eventually return a value.
A coroutine is a subroutine in that we can invoke a coroutine and it will eventually return a value. A subroutine is not a coroutine because a coroutine can also yield and resume.
Operations | Subroutine | Coroutine | Comment |
---|---|---|---|
Invoke | x | x | |
Return | x | x | |
Yield | x | Pauses the coroutine while waiting for something else like I/O | |
Resume | x | Resumes the coroutine after data is available |
This leads to a very important conclusion. A subroutine can not run a coroutine synchronously because the coroutine might wish to yield and that isn't supported by a subroutine.
Instead, as Kathleen says, you have to use coroutines all the way to the top of the stack.
Unfortunately, async in .NET uses Task
which supports .Result
and .RunSynchronously
.
So if async is causing problems then why should we bother?
I think there are two major use cases for async
The C10K problem
The C10K problem is handling concurrently 10,000 connections to a server.
Let's assume that each of these connection will at least do one database lookup.
"Classical" servers accept an incoming connection through a socket and spin up a new thread for each new connection. This means that for the C10K problem we will have 10,000 threads. Each thread need roughly 1 meg of continous memory for the stack ==> 10 Gig memory for stack space alone. In addition the OS will spend some time on thread switching and scheduling 10,000 threads.
If one think that each connection needs a certain workflow executed in the "classical" solution there's a tight coupling between what executes the code ie the thread and the work flow itself.
With async or coroutines there's a loose coupling between the executing thread and the work flow meaning a we can have 100 threads serving 10,000 work flows which will typically improve performance and reduce memory footprint.
One can even go the Node.js path and have single thread servicing all work flows eliminating many problems of concurrency through preemptive thread scheduling.
A major challenge in software engineering is how to implement a good application with a Graphical User Interface (GUI).
It's a challenge because a GUI might several multiple workflows running that are initiated by user events. These workflows might need be paused while waiting for data. We want the GUI to always be responsive which often implying threads. In addition, most GUI frameworks are thread affine meaning that the GUI objects might only be used by the GUI thread creating the issue that we need to switch back to the GUI thread in order to updates to the GUI from our background threads.
In fact, it's such a challenge that it seems that most strong developers gives up on GUI and focuses on the easy problem; back-end.
async tries to mitigate problem by making sure that when the coroutine is resumed it's executed by the GUI thread. This means that we have the benefit of not blocking the GUI while awaiting
on the data but we can freely access the GUI objects because we are always executing in the correct thread with the help of the implicit SynchronizationContext
.
Task
helps us with:
- Parallelism as we can create multiple
Task
backed by User Mode Threads that are executed by the different CPU cores in order to complete a CPU intensive task ASAP - Long running I/O background tasks
- Avoiding spinning too many threads while servicing a lot of connections
- Making sure we get back to the right thread
All of this is related to concurrency but different aspect of concurrency. Perhaps we should have a different API for coroutines as some aspects of Task
are useful and safe for say parallism like Task.Result
but fundamentally incompatible with coroutines.
I think it's very important that the coroutine API must only allow safe invocations from subroutines. As a coroutine can yield but a subroutine can't this means that the invocation from a subroutine will always report back the answer in callback.
We don't want to compose coroutines using callback hell but when invoking a coroutine from a subroutine we have to rely on callbacks.
I think the C10K problem and the event driven GUI problem are two distinct problems that shouldn't be solved with a single abstraction despite being superflous similar. I will focus on the C10K problem as I think that is the problem that brings the most value.
The fundamental coroutine abstraction will make or break everything the coroutine API. We should learn from prior art so let's look at some existing coroutine abstraction:
Task
is big (~200 methods/properties) in order to support multiple scenarios so I filtered the API heavily into what I think is the essential Task
API for supporting async.
public interface INotifyCompletion
{
void OnCompleted(Action continuation);
}
public interface ICriticalNotifyCompletion : INotifyCompletion
{
void UnsafeOnCompleted(Action continuation);
}
public struct TaskAwaiter<TResult> : ICriticalNotifyCompletion, INotifyCompletion
{
public bool IsCompleted { get; }
public TResult GetResult();
public void OnCompleted(Action continuation);
public void UnsafeOnCompleted(Action continuation);
}
public class Task : IAsyncResult, IDisposable
{
public ConfiguredTaskAwaitable ConfigureAwait(bool continueOnCapturedContext);
public void Dispose();
public TaskAwaiter GetAwaiter();
public void RunSynchronously(TaskScheduler scheduler);
public bool Wait(int millisecondsTimeout, CancellationToken cancellationToken);
}
public class Task<TResult> : Task
{
public TResult Result { get; }
public ConfiguredTaskAwaitable<TResult> ConfigureAwait(bool continueOnCapturedContext);
public TaskAwaiter<TResult> GetAwaiter();
}
One could make the argument that RunSynchronously
, Wait
and Result
aren't essential for Task
async API as they should never be used. However, it's a common mistake to use them that they are essential defects of the Task
async API.
When you use await task
in C# what happens is roughly this:
// Get the awaiter object
var awaiter = task.GetAwaiter ();
// Is the result already there??
if (awaiter.IsCompleted)
{
// Ok get the result, if the task failed or
// was cancelled this will throw that exception
var result = awaiter.GetResult ();
// Let's do some stuff with result
}
else
{
awaiter.UnsafeOnCompleted (() =>
{
// Ok get the result, if the task failed or
// was cancelled this will throw that exception
var result = awaiter.GetResult ();
// Let's do some stuff with result
});
}
What this means is that when async needs to yield (awaited.IsCompleted
is false) it registers a callback with the awaiter object and then let the thread continue do something else. When the task is done the callback will be called, resume, and the async functions continue to execute until it's done or it needs ot yield again.
Why doesn't async use the ContinueWith
API as there seems to be an overlap? Don't know! If we turn to the documentation it says:
This API supports the product infrastructure and is not intended to be used directly from your code.
What is the difference between UnsafeOnCompleted
and `OnCompleted?
Schedules the continuation action for the asynchronous task that is associated with this awaiter. This API supports the product infrastructure and is not intended to be used directly from your code.
Sets the action to perform when the TaskAwaiter object stops waiting for the asynchronous task to complete. This API supports the product infrastructure and is not intended to be used directly from your code.
My personal opinion on this is that I don't like that public API that is used by async, a core feature of C#, isn't documented fully. It makes it very hard for developers to know exactly what is happening and instead we learn through experimentation and anecdotes.
What about the ConfigureAwait
used by many libraries in order to be SyncronizationContext
agnostic?
Configures an awaiter used to await this Task. true to attempt to marshal the continuation back to the original context captured; otherwise, false.
Ok, so true means use the captured context, this is the default behavior, but what does otherwise mean? Does it use the default SyncronizationContext
? I don't know.
- The
Task
API is in my opinion overloaded in order to support a vast array of different scenarios related to concurrency but not all of the related to async. - The async API is under documented and shouldn't be used directly, most likely because .NET team doesn't want us to take dependency directly on the API to give them some flexibility in evoling the API. My guess is that they don't get that much flexibility as the new API still needs to be ABI compatible and this also means that a critical async API doesn't help us understand how async works.
- I am missing a detailed technical specification on how async works. "It just works" don't do it for me.
- Each time we invoke an async function we get a
Task
object back that we will probably register a callback onto through theTaskAwaiter
object. That means we are at least making two heap allocations. - There's an implicit coupling to
SynchronizationContext
which causes confusion in that the async will behave different depending on whatSynchronizationContext
is used.
I would like an async API that is
- Focused on coroutines.
- Well documented.
- Reduces
GC
pressure.
type cont<'T> = ('T -> FakeUnitValue)
type econt = (ExceptionDispatchInfo -> FakeUnitValue)
type ccont = (OperationCanceledException -> FakeUnitValue)
type AsyncParamsAux =
{ token : CancellationToken;
econt : econt;
ccont : ccont;
trampolineHolder : TrampolineHolder
}
type AsyncParams<'T> =
{ cont : cont<'T>
aux : AsyncParamsAux
}
type Async<'T> =
P of (AsyncParams<'T> -> FakeUnitValue)
type Handler = class
abstract DoHandle: byref<Worker> * exn -> unit
end
type Work = class
inherit Handler
val mutable Next: Work
abstract DoWork: byref<Worker> -> unit
end
type Worker = struct
val mutable WorkStack: Work
val mutable Handler: Handler
end
type Cont<'a> = class
inherit Work
val mutable Value: 'a
abstract DoCont: byref<Worker> * 'a -> unit
end
type Job<'a> = class
abstract DoJob: byref<Worker> * Cont<'a> -> unit
end
public interface CoroutineContext {
public operator fun <E : Element> get(key: Key<E>): E?
public fun <R> fold(initial: R, operation: (R, Element) -> R): R
public operator fun plus(context: CoroutineContext): CoroutineContext
public fun minusKey(key: Key<*>): CoroutineContext
public interface Element : CoroutineContext {
public val key: Key<*>
@Suppress("UNCHECKED_CAST")
public override operator fun <E : Element> get(key: Key<E>): E?
public override fun <R> fold(initial: R, operation: (R, Element) -> R): R
public override fun minusKey(key: Key<*>): CoroutineContext
}
public interface Key<E : Element>
}
public interface Continuation<in T> {
*/
public val context: CoroutineContext
public fun resume(value: T)
public fun resumeWithException(exception: Throwable)
}