Skip to content

Instantly share code, notes, and snippets.

@wangyingsm
Created September 29, 2021 06:41
Show Gist options
  • Save wangyingsm/e87d1e87f17198b001e33f9ebe328a2b to your computer and use it in GitHub Desktop.
Save wangyingsm/e87d1e87f17198b001e33f9ebe328a2b to your computer and use it in GitHub Desktop.
Waker API: 第二部分

The Waker API II: waking across threads

Waker API第二部分:跨线程等待

本文翻译自WithoutBoat大神的系列博文:The Waker API II: waking across threads.

In the previous post, I provided a lot of background on what the waker API is trying to solve. Toward the end, I touched on one of the tricky problems the waker API has: how do we handle thread safety for the dynamic Waker type? In this post, I want to look at that in greater detail: what we’ve been doing so far, and what I think we should do.

上一篇文章中,作者提供了许多waker API试图解决的问题的背景知识.到了最后,作者提到了在waker API中有一个很棘手的问题:在动态的Waker类型中如何处理线程安全?在本篇文章中,作者希望深入的讨论这个部分:目前已经做到了哪些,有哪些是作者认为我们应该实现的.

Restating the problem

复述问题

The goal of this portion of the API is to ensure we can support all of the kinds of waker implementations that are necessary. In particular, we want to be able to support implementations that have special behavior when called from the same thread the waker was originally constructed on. There are two variations on this:

  • The more common variation is to have an optimization specific to waking from the original thread, though you do support waking from different threads as well.
  • A more niche use case is to only support waking from the same thread. In this implementation, the executor is designed for programs that use no multithreading at all, and it’s tightly coupled to a particular reactor design.

这个部分的API的目标是保证我们能够支持所有需要的不同类型waker的实现.具体来说就是,我们希望能够支持实现当在同一线程上调用waker唤醒时的特殊行为.这有两个方案:

  • 第一种是更加通用的方案,当在同一个线程唤醒时会获得更加优化的效果,但是也能支持在不同线程唤醒.
  • 另一种更加特定的应用场景是仅支持在同一线程唤醒.在这个实现中,executor被设计成完全没有多线程,与特定的reactor紧紧的绑定在一起.

We’ve gone through a couple of iterations on this API. The design currently implemented on nightly has two waker types: Waker and LocalWaker. The difference between them is that the latter is not Send or Sync, and will call a specialized wake_local function when it is woken, instead of the default wake function. However, you can always convert a LocalWaker into a Waker using the into_waker method.

目前作者的项目已经在这些API上进行了数次的迭代.在nightly上的设计实现包括两种waker类型:WakerLocalWaker.两者的区别在于后者不实现SendSync,并且会调用wake_local函数实现唤醒,而不是默认的wake函数.但是,用户也可以使用into_waker方法将一个LocalWaker转变成一个Waker.

This is perfectly designed to support the first use case I described above, but the second is a bit trickier. As I outlined in my previous blog post, there are three ways to implement a waker. One is only suitable to no-std embedded environments and not relevant here, so I’ll reiterate the other two:

  • In the first, the waker is a TaskId which is used to identify the task to be woken.
  • In the second, the waker is a reference counted pointer to the task itself, which is then put back into the queue of tasks to be woken next.

这能够完美的解决第一个应用场景,但是第二个场景有点麻烦.正如作者在上一篇文章中讨论的,有三种方式实现一个waker.其中有一种只适合于不使用标准库的嵌入式环境,与本文讨论内容无关,因此我们来回顾以下另外两种:

  • 第一种是waker是一个TaskId,用来标识需要被唤醒的任务.
  • 第二种是waker是任务引用计数的一个引用,当被唤醒时将重新插入到队列中.

The API I just described does not support the second case using a non-atomic Rc. This is because you could construct a Waker, move it to another thread, and clone or drop it. This introduces a data race in access to the reference count.

作者刚才描述的API无法支持在第二种方式中使用非原子引用计数.这是因为你会构建一个waker,然后可能将其move到另一个线程中,然后clone或drop掉.这会在访问引用计数时导致数据竞争.

For that reason, the RFC currently proposes to change the API, getting rid of wake_local, and using a different strategy instead. In this strategy, there’s instead an into_waker hook that the implementation can use to either change its wake implementation (in the case where it just has a same-threaded optimization) or panic (in the case where it is not meant to be called from multiple threads).

因此,这个RFC建议修改API,去掉wake_local方法,使用不同的策略.建议的策略是使用into_waker方法来对两种waker的wake实现修改(在同一线程实现优化的情况下)或者panic(在不允许不同线程唤醒的情况下).

From an end user’s perspective, the API is largely unchanged: there are two waker types, LocalWaker and Waker, with the same conversions between them. But we’ve now supported one additional implementation. So that seems like a win. But the problem is this: it is exactly this unchanged portion of the API that has a lot of costs for users of the API.

站在最终用户的角度,这个API并没有发生什么变化:仍然有着两种waker类型,LocalWakerWaker,仍然使用相同的方法进行转化.但是我们现在支持多一种的实现方案.看起来是一件好事.但问题在于:正是API中这个不变的部分给使用者带来了很大的成本.

The high costs of distinguishing Waker from LocalWaker

区分WakerLocalWaker的高成本

I had the opportunity to use the waker API extensively recently (in creating the [romio][romio] crate). The distinction between Waker and LocalWaker had not existed the last time I had dealt with the futures API, so I was experiencing it very much as a newcomer. And I’m afraid I must admit: I was, at first, quite baffled. A lot of strangeness conspires to make this API exceptionally confusing:

  • You receive a LocalWaker from the executor, rather than a Waker. It’s unclear without a lot of explanation whether you’re supposed to convert it to a Waker (the thing you probably really want) early or not.
  • LocalWaker is not Send or Sync, but Waker is, and there’s a conversion from LocalWaker to Waker. This looks very odd: it makes it hard to understand why LocalWaker isn’t threadsafe, it can be converted directly to a threadsafe version.
  • The AtomicWaker API in the futures library receives an &LocalWaker argument. Internally, it converts that immediately to a Waker. But this means that a library like romio is exclusively dealing with &LocalWaker, never directly seeing the Waker type. And yet, because the API it uses makes the conversion to Waker, it is incompatible with a local-only executor. This is uninituitive and surprising.
  • Having more versions of things is just inherently more confusing. Especially with multiple ways to construct a Waker/LocalWaker (from Wake or UnsafeWake/RawWaker), there’s now a grid of combinations between different API components, and understanding how they all relate (or don’t) is hard to learn, on top of learning how to use the APIs probably.

作者最近有比较多的机会使用waker API(创建了romio).作者上一次使用futures API的时候还不存在WakerLocalWaker的区别,因此作者在这方面也是一个新手.作者不得不承认:这个区别十分让人困惑.很多奇怪的潜规则导致这个API异常令人混乱:

  • 你从executor中获得了一个LocalWaker,而不是一个Waker.没有详细的文档或者资料解释说明你是否应该尽早将其转换为一个Waker,而后者可能恰恰才是你需要的.
  • LocalWaker不是SendSync的,而Waker是的,并且可以将一个LocalWaker转化为一个Waker.这看起来非常奇怪:很难以理解为何LocalWaker不是线程安全的,却能够直接转化成一个线程安全的数据.
  • 在futures库中AtomicWaker的API接受一个&LocalWaker参数.在内部它立即被转化为一个Waker.但这意味着类似romio这样的库只能处理&LocalWaker,永远不会直接接触Waker类型.但是因为API又将七转化成了Waker,因此无法与单线程executor兼容.这是非常反直觉和奇怪的.
  • 使用了更多的内容将会导致更多的混淆.特别是有很多方式来构建Waker/LocalWaker(从WakeUnsafeWake/RawWaker),这就造成有多种不同的API的组合,理解它们是如何关联(或者非关联)是非常困难的,从而造成最终使用上的困难.

It would be much simpler if the API that a future used could just look like this:

如果API如下面那样设计,用户使用起来将会容易很多:

struct Waker { ... }

impl Send for Waker { ... }
impl Sync for Waker { ... }

impl Waker {
    fn wake(&self) { .... }
}

This is what the API looked like for years (under different names in earlier periods). It seemed to work well. So I asked myself: what are really getting for this additional complexity, and is it worth it?

这就是很多年来API设计的样子(早期时候可能使用了不同的名称).它们似乎工作良好.因此作者要问自己:我们到底从这些额外的复杂度中获得了什么,这样做值得吗?

What are we getting for this?

I talked my concerns through with cramertj, and ultimately we reached these conclusions:

  1. The first use case - the optimization for the same thread - can easily just use TLS: either literally checking that its on the same thread or (more likely) storing its thread-local queue in TLS and checking if the thread-local queue exists or not. In other words, the first use case really needs no additional support from the API, LocalWaker isn’t necessary for it to be supported.

  2. The second use case is more interesting. There is one thing that indeed cannot be supported without the API distinction: using an

    Rc

    , non-atomically reference counted task. There are still other ways to implement a singlethreaded event loop, however:

    • Using the Task IDs technique instead of the reference counting technique. Panic when you wake from another thread, instead of when you move to it. This strategy works completely fine.
    • Using atomic reference counts. Since your application is single threaded, on x86 at least this should have essentially no overhead over using nonatomic reference counts.

作者与cramertj就这个话题展开了深入的讨论,最终获得了以下的结论:

  1. 针对第一种应用场景,在相同线程上会实现优化,可以使用TLS(线程本地存储)简单实现:或者直接检查是否处于同一线程,或者(更有可能)是将线程本地队列保存到TLS中然后检查其是否存在.换句话说,对于第一种应用场景,实际上不需要API的特殊支持,也不需要支持LocalWaker这个类型.

  2. 第二种应用场景比较有趣.这里有一个问题涉及到必须获得API的支持,那就是使用

    Rc

    ,非原子引用计数来代表任务.不过似乎还有其他的办法来实现单线程的时间循环:

    • 使用任务ID的方式而不是引用计数的方式.当试图从另一个线程唤醒时会panic,而不是move过去.这个方案可以良好工作.
    • 使用原子引用计数.因为你的应用是单线程的,至少在x86架构上这样做不会比使用非原子引用计数带来更多的性能消耗.

So I had to ask myself: is forcing every author of a manual future to deal with this complexity and unintuitiveness worth it to allow one particular of the multiple implementation strategy for a niche executor use case? To me the answer was clear: we’re paying a cost in API ergonomics that doesn’t actually buy us very much.

因此作者问自己:仅仅为了这么一个小小的需求,并且还有着其他的解决方案,就强制要求所有用户都要面对这样的复杂度和反直觉的情况,是否值得?作者的答案显而易见:我们在API设计上付出的成本远远超过我们获得的结果.

cramertj agreed. We talked about this before the holidays. When I came back I started this blog series, whereas he just wrote a PR to the RFC. This PR would be the last major change to the futures API before stabilization. By eliminating the distinction between Waker and LocalWaker, I think the waker API becomes much more comprehensible.

cramertj也表示同意.我们是在节日前进行的讨论.当作者放假回来开始写这系列博客时,cramertj已经提交了一个PR.这将会是futures API稳定之前最后一个主要的PR了.不再区分WakerLocalWaker之后,作者认为waker API变得更加容易理解了.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment