Any use of thenables within the ecosystem has a high risk of breaking internal context propagation. Linear context propagation is fine, it's only the points in the async call graph where execution branches and potentially converges later which are impacted. This might not seem like a huge deal at first, but not all control flow is linear, especially for observability and diagnostics systems. It's important to have state consistency from anywhere that code is reachable, otherwise eventually someone will try something unusual and break the system. When you're dealing with a userbase as large as Node.js, it doesn't take long for someone to do something unique and unexpected.
See below for a code example demonstrating the exact scenarios when context loss will and will not happen.
const assert = require('assert')
const { AsyncLocalStorage } = require('async_hooks')
const store = new AsyncLocalStorage()
const data = Symbol('verifier')
function thenable() {
return {
// This then(...) method, unless called directly with promise.then(...)
// in user code, will be executed within a microtask that is IN NO WAY
// wrapped by async_hooks. This means the execution async id of the task
// will be 0 and therefore the trigger id of any async tasks run within
// the task will also be 0.
//
// Because the ids are zero at this point, async_hooks will think the
// current resource is the top-level resource. This could result in
// pollution of the top-level resource which could then propagate into all
// future descending resources in the async execution tree, depending on
// how your continuation local storage system is implemented. Fortunately
// `AsyncLocalStorage` is immune to this, but most of the APM ecosystem
// currently uses their own CLS and some may continue to do so due to
// potential for additional resource-tracking insights.
then(cb) {
assert.equal(store.getStore(), data) // Fail
setImmediate(cb)
}
}
}
// The following four `store.run(...)` samples will "upgrade" the thenable to
// a real promise within the unwrapped microtask mentioned previously.
// Await a thenable
store.run(data, async () => {
assert.equal(store.getStore(), data) // Pass
await thenable()
// Linear propagation restored to the correct context
assert.equal(store.getStore(), data) // Pass
})
// Returning a thenable in an async function
store.run(data, async () => {
try {
assert.equal(store.getStore(), data) // Pass
return thenable()
} finally {
// Linear propagation restored to the correct context
assert.equal(store.getStore(), data) // Pass
}
})
// Resolving a thenable
store.run(data, () => {
assert.equal(store.getStore(), data) // Pass
Promise.resolve(thenable())
// Linear propagation restored to the correct context
assert.equal(store.getStore(), data) // Pass
})
// Returning a thenable in a then handler
store.run(data, () => {
assert.equal(store.getStore(), data) // Pass
Promise.resolve().then(() => thenable())
// Linear propagation restored to the correct context
assert.equal(store.getStore(), data) // Pass
})
The current "recommended" way to deal with context loss issues is using the
AsyncResource
API. To fix the thenable above, one might do something like:
function thenable() {
const resource = new AsyncResource('thenable')
return {
then(cb) {
resource.runInAsyncScope(() => {
assert.equal(store.getStore(), data) // Pass
setImmediate(cb)
})
}
}
}
This approach will work, but this approach will not, and only situationally:
function thenable() {
return {
then(cb) {
const resource = new AsyncResource('thenable')
resource.runInAsyncScope(() => {
assert.equal(store.getStore(), data) // Fail
setImmediate(cb)
})
}
}
}
In this simple example, the former is not any more difficult than the later, but in a more real-world situation, the point at which the resource needs to be created may be less obvious.
It's very easy to use AsyncResource
incorrectly. The docs suggest subclassing
but also show direct construction which can very often lead to issues.
Here are a few examples of incorrect AsyncResource
uses I've seen due to it
being unclear when inits happen and what the context would be at that point.
class Query {
// Most modules I've seen do this. It dispatches the query to a
// promise-returning method of a shared client connection.
then(...fns) {
// The async_hooks ids and resource stack are empty here so we get an
// orphaned init and resource that doesn't link to anything useful.
const resource = new AsyncResource('thing')
return resource.runInAsyncScope(() => {
// Not only that, any async activity from this client.query(...) call
// will link to this orphaned resource and therefore will also be
// detached from the expected context. All descending asyncrony will
// occur within this phantom branch which connects to nothing therefore
// all following activity an APM might be interested in capturing can no
// longer be attributed to anything useful.
return this.client.query(this).then(...fns)
})
}
}
That previous example is a fairly straightforward demonstration of the issue,
and the most immediately obvious way to fix it would be to move the
AsyncResource
construction out of the then(...)
method. A naive approach
might be to just put it in a "terminal" method like end
, run
, execute
,
etc. However, it's easy to accidentally direct to that call from another
unwrapped thenable.
class Query {
// This looks fine right? If we call `query.execute()` in our user code we
// will get the `AsyncResource` constructed in a valid context.
execute() {
const resource = new AsyncResource('thing')
return resource.runInAsyncScope(() => {
return this.client.query(this)
})
}
// But then what if we just make a little alias...
then(...fns) {
// Nope. The context is gone here so it will also be gone in `execute`.
return this.execute().then(...fns)
}
}
Here's another mysterious one I've seen...
class Query {
// If we invoke the promise method outside the `AsyncResource` and only
// wrap the attachment of the `then(...)` handler, we actually get
// _nothing_ linking to the `AsyncResource` context. The promise that
// is created when calling `then(...)` links directly to the promise in
// the outer context because the current design of async_hooks considers
// the internal resolve of the parent promise to be the init, not the
// attachment of a `then(...)` handler, even though that is producing a
// new promise.
then(...fns) {
const result = this.client.query(this)
const resource = new AsyncResource('thing')
return resource.runInAsyncScope(() => {
return result.then(...fns)
})
}
}
The difficult of finding the right place to create the AsyncResource
might
suggest that perhaps only the subclass form should be supported, however
that imposes significant restrictions on how a developer can write their code.
Most cases where AsyncResource
is needed currently are in database modules
which typically have their own class hierarchy that could make introducing
another parent class a rather complicated task.
The reason this gap is larger than some might think is that control flow is not always linear. You can get many complicated diverging and converging systems in an async environment, and each branch produces its own chain of asyncronous activity. If there is a break in that chain, an entire branch of execution context can be lost.
Here's a generic exponential backoff system someone might build...
async function sendMessageToUserById(id, message) {
// We have a backoff class that takes min and max timeout values and a task
// to try and perform. When the timeout is reached, the task rejects with a
// `Timeout` error to indicate to the user that they should try again.
const attempt = new BackOff(100, 10000, async () => {
const user = await User.findById()
await user.sendMessage(message)
})
while (true) {
try {
// We await the instance to let it make an attempt at doing the task.
await attempt
// If it resolves, we are done and can break out of the loop.
break
} catch (err) {
// Otherwise, we need to decide if we want to log and continue, or
// reject with the given error.
if (err instanceof Timeout) {
console.warn(err.message)
} else {
throw err
}
}
}
}
class Timeout extends Error {
constructor(ms) {
super(`Timed out after ${ms}ms, retrying...`)
}
}
class BackOff {
constructor (min, max, task) {
this.min = min
this.max = max
this.task = task
this.current = min
}
// Using thenables, we can have a reusable object to await until it succeeds.
// Unfortunately, the context inside this `then(...)` method will be empty
// if it is triggered via an await, `Promise.resolve(...)` or `then(...)`
// handler return value. This _also_ means that _all_ activity in the task
// function will link back to this null context, making all that activity
// impossible to correlate to the execution of `sendMessageToUserById`.
then(resolve, reject) {
const backoff = this.current
this.current = Math.min(this.max, this.current * 2)
function done() {
clearTimeout(timer)
isDone = true
}
let isDone = false
const timer = setTimeout(() => {
reject(new Timeout(backoff))
done()
}, backoff)
this.task().then(result => {
if (isDone) return
done()
resolve(result)
}, error => {
if (isDone) return
done()
reject(error)
})
}
}
You can argue all you want about if the code is "bad practice" or not, but it doesn't change that people will write code like this simply because it is possible to do so.
It's a good idea to advocate for best practices in how people use Node.js, but because Node.js is a platform that relies heavily on the module ecosystem, we must consider what the impact would be if a major module happened to introduce behaviour like this. A crash is easy to diagnose, but context loss can be very tricky, especially when it comes as a result of someone else's code. Trust me, this is a headache you do not want to have. 😬
About half of the popular database/cache libraries use native promises or are callback-only and expect users to promisify things themselves. In the callback-only case, there are often hundreds of different styled promisifying libraries which vary widely in how they achieve that, many doing it in a way that breaks the context.
The other half of popular database/cache libraries are known to be broken and have been for a long time.
Here's a few examples:
- mongodb decided to let the user deal with it
- mongoose rejected AsyncResource due to experimental status
- pg has no support
- knex query builder has no support
The approach from mongodb I think is a notable one to examine because we passed the cognitive burden of context management to the module developers and then they just passed it further along to the end-user.
Because the majority of Node.js apps use database or cache libraries and
the majority of these libraries break context, this is a very significant
issue. Both async_hooks
and AsyncLocalStorage
will produce incomplete
context coverage in most applications.
The issue comes down to code structure in the InternalCallbackScope
function.
The function does a
MicrotaskScope::PerformCheckpoint(...)
call at a point which is just after the JavaScript execution scope has finished
and the async_hooks "after" event has been fired along with the resource being
removed from the resource stack. A naive approach might be to move this before
those happen or use a DefaultTriggerAsyncIdScope
to encapsulate that block.
However, the problem is that it needs to be in execution scope, not trigger
scope, and each individual microtask should have its own separate execution
scope.
Currently we have a queueMicrotask
function which does effectively the same
thing as what happens with the thenable resolve, but that function
explicitly wraps itself in an AsyncResource
.
This wrap needs to exist for all microtasks, not just the ones we create directly.
Well, I shared a high-level way to fix it for AsyncLocalStorage
. If you're
reading this, you probably already know that was not received well. 😅
To fix it at a lower-level will be quite complicated, but the current theory
I'm exploring is that we could provide our own custom MicrotaskQueue
which
wraps the EnqueueMicrotask(...)
methods.
The custom EnqueueMicrotask(...)
method would bind to the context of the
last init to occur before the method was called and then inject microtasks
around the true microtask to enter the execution context before it runs
and exit the execution context after.
The other possibility I've investigated is if we could get V8 to emit the
PromiseHook
events for thenables, however I haven't been able to figure
out a clear way to achieve that without breaking their existing internal
use of it.
I would very much appreciate feedback on this idea or any guidance you may be able to provide. 🙂
This sounds like something that could be improved in the documentation.
The proposed fix (nodejs/node#33189) won't fix most problems with context loss in client libraries, as it's aimed towards a specific scenario. I find this part a bit misleading as it gives readers an impression that the fix resolves all problems with custom thenables, while it's not.
It would be great to see such low-level fix, as all
async_hooks
based code could benefit from it, not only ALS.