Skip to content

Instantly share code, notes, and snippets.

@wangyingsm
Created September 28, 2021 09:29
Show Gist options
  • Save wangyingsm/f04598c2009d127049714f7c05d41454 to your computer and use it in GitHub Desktop.
Save wangyingsm/f04598c2009d127049714f7c05d41454 to your computer and use it in GitHub Desktop.
Waker API: 第一部分

The Waker API I: what does a waker do?

Waker API 第一部分:waker是干什么的?

本文翻译自WithoutBoats大神的博客文章:The Waker API I: what does a waker do?

[TOC]

Work on supporting async/await in Rust continues to progress rapidly. I’m hoping to write a retrospective on everything that happened in 2018 in a few weeks. Right now we’re closing in on an important milestone: stabilizing the futures API that will be used to interact programmatically with asynchronous computations. The biggest remaining area of work is the design of the waker API, an essential but somewhat opaque part of how our asynchronous programming system works. I want to take a look at this API and try to make it a bit clearer, so the design decisions regarding the API become clearer.

在Rust中实现支持async/await仍处于高速的进展中.作者希望在本系列博客中对2018年这方面的进展进行一个回顾.目前的最新进展是:在异步计算编程中稳定了futuresAPI,为用户提供一个可以依赖的异步交互编程环境.这也是一个重要的里程碑.当前剩余的最重要的工作是waker API的设计,这是一个关键领域,但是通常在异步程序设计系统中对用户是不透明的部分.作者希望在相关文章中深入介绍这些API,使得它们更加清晰,并且让这些API的设计也变得更加明确.

The poll/wait/wake cycle

poll/wait/wake 循环

A running programming using async/await correctly involves three fundamental components:

  • The bulk of the program consists of futures, which are a kind of pause-able computation. In general, the end user’s code is will consist of futures, which they will write as if they were normal functions using async/await syntax.
  • At the “top” of the program is the executor. The executor schedules futures by polling them when they are ready to make progress.
  • At the “bottom” of the program, the futures depend on event sources. (in the case of async IO, the source of IO events is often called the reactor). The event source wakes the executor when an event occurs that will allow a future depending on it to make progress.

一个正确使用async/await的程序包括以下三个基础组件:

  • 一组包含**futures**的代码结构,也就是一种可暂停的计算方式.通常来说,最终用户的代码当中不会包括futures,他们会使用async/await语法来尽量接近原本书写普通代码的习惯.
  • 在程序"之上"是**executor**.执行器通过polling futures方式在futures可以进行新的进展时调度它们.
  • 在程序"之下",futures依靠**event source.(在异步IO领域,IO事件的来源通常被称为reactor**).这个event source会在事件发生时唤醒执行器使得依赖它的future能够继续进行.

Once a future is spawned onto an executor, that future gets executed to completion using a three phase cycle:

  1. Poll: The executor polls the future, which computes until it reaches a point at which it can no longer make progress.
  2. Wait: The reactor or event source registers that this future is waiting on an event to happen. The future has returned Poll::Pending and the event source is now tracking that it will need to wake this future when that event is ready.
  3. Wake: The event happens, and the future is woken up. It is now up to the executor to schedule the future to be polled again.

当future在执行器中被spawn出来后,它会在一个三步的循环中运行直至完成:

  1. Poll:执行器poll该future,使其得到执行直到达到一个无法继续取得进展的点.
  2. Wait:reactor或者event source将该future记录为正在等待一个事件发生.此时future返回Poll::Pending,并且event source开始追踪它的状态,因为它需要在这个事件满足时唤醒future.
  3. Wake:当事件发生时,future被唤醒.然后交由executor来决定该furute什么时候再次进行poll.

At a high level you can think of the executor as managing the program’s compute resources (scheduling futures that are awake and ready to be polled) and the reactior as managing the program’s IO resources (waiting on IO events and waking futures when they are ready). The executor and the reactor form the two halves of what in most asynchronous computing systems is called the “event loop.” One of the great things about Rust’s design is that you can combine different executors with different reactors, instead of being tied down to one runtime library for both pieces.

在高层次上,你可以认为executor的作用时管理程序的计算资源(调度那些被唤醒的并且可以进行poll的futures),而reactor的作用时管理程序的IO资源(等待IO事件的发生并且唤醒相关的futures).这里的executor和reactor就是大多数异步计算系统中被称为"event loop"的两大组成部分.Rust设计中很赞的一点就是,它允许你组合不同的executor和reactor,而不是强制将你绑定在一个固定的运行时库之上.

Requirements on the Waker API

Waker API的需求

The way that the executor and the event sources coordinate waiting and waking is the Waker API. When the Future is polled, the executor passes it a waker, which an event source will register and eventually wake. Each of the three phases of the cycle put additional pressure on the API, making it a very tricky API to get right.

executor和event source协作进行等待和唤醒的方式就是通过Waker API.当future被poll的时候,executor会传入一个waker,让event source记录下来并最终使用它来唤醒.上面三步循环中的每个部分都为这个API增加了额外的压力,使用正确设计这些API变得非常复杂.

What does the “poll” phase require?

During the poll phase, very little is done with the waker: the value passed to the future is just passed down, further and further, until it reaches an actual source of events, which will register it (beginning the “wait” phase). The only requirement that the poll phase introduces is dynamism: because it needs to be passed through arbitrary futures, the waker type cannot be generic. This means that every requirement introduced by the other two phases needs to be dynamically dispatched.

在poll阶段,waker要做的事情很少:waker的值仅仅是在调用栈向下传递,直到达到真正的事件来源,这个来源会登记该waker(同时开始"wait"阶段).在poll阶段对waker的需求只有一点就是动态:因为waker需要传递给任意的futures,因此waker的类型不能使用泛型.这意味着另外两个阶段当中对waker的调用都需要使用**dynamically dispatched**(动态分派).

Rust has support for relatively easy dynamic dispatch using trait objects, but because of the rules of object safety, this easy form of dynamic dispatch is often quite limited. Indeed, we’ll find its too limited to support our use case, which is why all of the API proposals have a “Waker” type, instead of just using references to dyn Wake trait objects.

Rust使用trait objects来支持相对简单的动态分派,但是因为object safety(对象安全)的原则,这种方式通常受到一定限制,具体在异步计算的场景中,作者发现它的限制太大,无法满足需要,因此基本上所有的API实现中都有一个"Waker"类型,而不是使用一个dyn Wait特性对象的引用.

What does the “wait” phase require?

"wait"阶段需要什么?

The wait phase is, from the perspective of the waker, very simple: the event source registers that waker is waiting on an event, and then does nothing until the event occurs (beginning the “wake” phase). This introduces one additional requirement: the waker type must implement Clone.

从waker的视角看来wait阶段非常简单:event source将waker登记在一个事件之上,然后就不做任何事情直到该事件发生(开始进入"wake"阶段).这引入了一个额外的需求:waker类型必须实现Clone.

The reason for this is straightforward: when an event source registers that this future will be waiting on an event, it must store the waker so that it can later call wake to begin the wake phase. In order to introduce concurrency, its pretty essential to be able to wait on multiple events at the same time, so its not possible for the waker to be uniquely owned by a single event source. As a result, the Waker type needs to be cloneable.

理由很直接:当event source将这个future登记为等待一个事件,它必须存储这个waker才能在wake阶段唤醒future.为了满足并发性,在同一时间能够wait多个事件是一个很基本的要求,因此waker不可能被一个event source唯一的获得所有权.这样导致的结果就是Waker类型必须是cloneable的.

This is where the API immediately starts getting tricky to design, because it interacts with the previous requirement that the waker’s methods be dynamically dispatched. The Clone trait itself is not object safe, because it returns the Self type. What the waker implementations need is a function from &self to Waker, not &self to Self.

这里开始API立刻变得难以设计了,因为前面说过waker的方法是需要动态分派的.Clone特性本身却不是对象安全的,因为它返回了Self类型.waker的实现需要一个函数接受&self返回Waker,而不是接受&self返回Self.

What does the “wake” phase require?

"wake"阶段需要什么?

The final phase is the phase in which the wakers really do all of their work. Once the event we’re waiting on has happened, the event source calls the wake method. The wake method is implemented by each executor, and contains the logic for setting up this future to be polled again by the executor. It turns out there are several ways to implement this, and we’d like to be able to support all of them.

  • Using an &‘static AtomicBool: In this implementation, the executor can only run one task at a time. When its time to wake that task, a global flag is tripped, and then the task will be polled again via a sidechannel. This implementation does not make sense for most use cases, but it is actually being used by some users on embedded platforms with extremely minimal resources. The waker is represented as a reference to the global flag.
  • Using Task IDs: In this implementation, the executor stores a global set of tasks that it is current polling in some sort of associative map. When it is time to wake a task, the executor is given the ID for that task in order to tell it which task is ready to be polled. The waker is represented as this task ID (in effect, the waker’s data is a usize).
  • Using reference counting: In this implementation (which has become the predominant implementation), the executor is just one or more queue of tasks that it will fetch from and poll as soon as they’re ready. The waker is itself a reference counted pointer to a particular task, and when it is time to wake, it puts itself onto the executor’s queue. In this case, the waker is represented as a reference counted pointer.

最后的阶段是waker真正完成工作的部分.一旦等待的事件发生,event source会调用wake方法.wake方法在每个executor中都会实现,里面包含着将future重新放入executor中等待被poll的逻辑.实际上作者发现有集中方式来实现,并且打算支持全部的这些设计.

  • 使用&'static AtomicBool:在这种实现中,executor只能同时运行一个任务.当到了需要唤醒任务的时候,一个全局的标志会被置位,然后任务会通过一个旁路通道被重新poll.这种实现对于大多数应用场景来说都是没有意义的,但是确实在一些嵌入式平台的用户当中会使用到它,因为这个情况下资源是极端受限的.waker被表示为一个全局标记的引用.
  • 使用任务ID:在这个实现中,executor使用某种关联表的形式保存一个全局的任务集合,里面存储了所有可能被poll的任务.当需要唤醒任务时,executor获得该任务的ID,被告知该任务可以被poll了.waker表示为这个任务的ID(实现中waker的数据是一个usize).
  • 使用引用计数:在这个实现中(也是目前主流的实现方式),executor只是一个或多个的任务队列,可以从中获取任务然后当它们准备好后进行poll操作.waker本身是一个特定任务的引用计数,当需要被唤醒时,waker将自身放置到executor的的队列当中.在这种情况下,waker表示为一个引用计数指针.

All three of these implementation strategies ultimately represent the waker in the same data: an arbitrary, address-sized integer (in the first and third cases, they are addresses, whereas in the middle case, they’re just indexes). However, the third case also requires the ability to run dynamic code on clone and drop, in order to handle the reference count properly.

所有这三种实现策略的最终表示形式都是一致的:一个任意的地址长度大小的整数(在第一种和第三种实现中,它们就是地址,第二种实现中是序号).不过,第三种方式还需要动态运行clonedrop的能力.从而能够正确的操作引用计数.

The requirement to dynamically support this wide set of implementation strategies is why you see functions like clone_raw and drop_raw in the various APIs for wakers.

这样的需求导致了很多的实现都会具有类似clone_rawdrop_raw这样的函数.

Waking and threading

唤醒和线程

The final tricky bit of the wake phase is how to handle threading. In particular, some implementations of executors would prefer to handle waking from the same thread the waker was created on differently from waking from a different thread. There are two variations on this idea:

  1. Some executor implementations can have an optimized path when they are woken from the same thread. They support waking from any thread, but if you wake from the same thread, there is a path they can take which would represent a nontrivial performance improvement.
  2. Some executor implementations don’t want to support multithreading at all. These executors would be bundled with their reactor and share state with it, running all the futures concurrently but on the same operating system thread. These executors don’t want to pay costs associated with multithreading at all.

在唤醒阶段最后还有一个复杂的部分就是如何处理线程.确切来说,有些executor的实现会优先在waker创建的相同线程上处理唤醒工作,而不是在不同的线程上来唤醒.这个设计有两个变体:

  1. 一些executor的实现在相同线程上唤醒能够具有更优化的执行路径.它们支持在任何线程上唤醒,但如果在相同线程上唤醒,那么就能获得不小的性能提升.
  2. 一些executor的实现根本不打算实现多线程.这些executor会将它们的reactor和共享状态打包在自身上,然后在同一个系统线程上并发的执行所有的futures.这些executors不愿意在多线程上付出额外成本.

A lot of the consternation in designing the Waker API has centered around how to handle these use cases. It’s the reason that there is currently a distinction in the API between a LocalWaker type and a Waker type.

在设计Waker API的时候会碰到很多这样的问题,围绕着如何处理这些应用场景.这也是目前在API中存在独立的LocalWakerWaker类型的原因.

Conclusion

总结

That’s as good an overview as I can give of the requirements that the waker API needs to solve. Hopefully this context can help more people follow the discussion going forward.

以上是作者希望阐述的waker API需要解决问题的一个概述.期望这篇博文能够帮助更多读者在这个方面走的更加深入.

Astute readers may notice that I have not actually outlined any conrete API in this post: neither the one implemented in nightly, nor the one proposed in the current RFC, nor a third alternative proposal. Even more astute readers may notice that this post is labeled as being the first in the series.

聪明的读者可能发现作者在这篇问里没有提到任何具体的API:既没有目前已经在nightly中实现的那些,也没有在RFC中建议的那些,也没有任何第三方可选的建议.不过更聪明的读者也会发现这是系列中的第一篇.

In the next post in the series, I’m going to drill into the final issue I mentioned: threading. In the second post in this series, I’ll try to lay out what I believe is the most prudent and well balanced way to handle single-thread optimizations.

在写一篇博文中,作者将会深入在本篇的最后一个议题:线程.作者将在第二篇中详细说明作者认为最谨慎和平衡的方式来处理单线程优化的内容.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment