benjamingr/Post Mortem.md

## Post Mortem.md

      
    Raw
  

              Post Mortem.md
            
          
    Post Mortem Debugging in NodeJS in the Light of Promises

Context:

we're discussing how we can safely abort on unhandled rejections and obtain meaningful debugging information. Related reading: https://gist.github.com/misterdjules/2969aa1b5e6440a7e401#file-post-mortem-debugging-with-promises-md.
In particular, we're discussing https://gist.github.com/misterdjules/2969aa1b5e6440a7e401#removing-implicit-trycatch-blocks-from-v8s-promises-implementation
Othe recommended reading: On unhandledRejection https://gist.github.com/benjamingr/0237932cee84712951a2
###Idea
We add a --abort-on-sync-unhandled-rejection flag to Node. That flag's goal is to give meaningful core dumps in Node while letting the vast majority of promise users remain unaffected. This is related to the issue of promisifying core-apis but it is aimed at also solving the problem for other users.
That flag aborts the process and generates a core dump whenever a promise is rejected and that rejection is not recovered from - similarly to --abort-on-uncaught-exceptions.
When is a promise rejection not handled.

The problem is that we never "know for sure" when a promise has rejected and that rejection cannot be recovered form. We need to make a reasonable compromise on when a promise has rejected and no catch handlers will be attached to it.

Using the unhandledRejection event is unfit, like @misterdjules and @chrisdickinson have said - the dump is much less meaningful at this point. It triggers after the microtask queue has been depleted - this eliminates some false positives, but not all.
Assuming promise errors will not be recovered from is also unfit - since in practice promise code throws liberally for operational errors.

A compromise can be reached by dumping the core if no rejection handlers have been attached up until the point of the error thrown. Since code in then blocks always runs asynchronously this means the vast majority of use cases are covered. We also have a meaningful core at this point to dump since we're actually during running the callbacks.
Promise.resolve().then(() => {
   throw new Error(); 
}).catch(e => { // since the catch handler is added before the `then` executes, this is fine.
   // ...
});
var p = Promise.resolve().then(() => {
   throw new Error(); 
});

p.catch(e => { // since the catch handler is added before the `then` executes, this is fine. No false positive
   // ...
});
var p;
Promise.resolve().then(() => {
   throw new Error(); 
}).

setTimeout(() => p.catch(e => { // this aborts, like Chris said, code can always be written to avoid this case.
   // ...
}));
var p;
Promise.resolve().then(() => {
   throw new Error(); 
}).

process.nextTick(() => p.catch(e => { // this aborts, like Chris said, the "unhandledRejection" event avoids this problem
                                     
   // ...
}));
While personally in my code I don't use unhandledRejection for anything except actual errors which means I can throw in it (causing an abort) - this could definitely break some peoples' flow.
The flag terminates the process with a heap dump on any rejection on a promise if the rejection handler is not added synchronously. This effectively means "has not been already added" since then callbacks run after the catch handlers have been added if those are added synchronously. This requires a little stricter guarantee than the slack we give in unhandledRejection, in practice I just checked in my code and my code and some packages I use behave well.
There is a flow described in the original post here: groundwater/nodejs-symposiums#5 (comment) that also shows how it can be used to provide post-mortem with responsible aborting (nodejs/post-mortem#18)
Advantages:

Terminating it synchronously means we get a decent heap dump, post-mortem people are happy.
Promise code isn't changed, promises users are happy.
Provides a way to write highly available NodeJS code, fixing the issue described in nodejs/post-mortem#18 .
Very small API surface, a single flag is added.

Disadvantages:

"Misbehaving" promise code can't be used, this flag requires people to write their promise code in a certain way if they want post-mortem debugging.

It might be possible to mitigate that through starting to warn on unhandledRejection although that won't always fix it.


Doesn't give us post-mortem debugging for every request but rather only the first, I think this is the way forward since otherwise you're exposed to a DOS attack.
The promise constructor would have to be wrapped and throws there will be converted to aborts rather than rejections with the flag on. In practice throwing synchronously in the promise constructor isn't common and is typically the sort of programmer error that post-mortem analysis wants to catch - so this is acceptable.

The idea is based on work by @misterdjules @chrisdickinson and @groundwater as well as previous work by numerous people.