davidmarkclements/gist:3ac8e941ce5c119c0f39 Secret

## gistfile1.md

      
    Raw
  

              gistfile1.md
            
          
    10 Tips for Coding with Node.js #4: Reproduce Core Callback Signatures

Welcome to part four of the Ten Tips decalogy, here's the ten tips:

Develop debugging techniques
How to Avail and Beware of the Ecosystem
How to Know when (not) to throw
Reproduce core callback signatures
Use streams
Break out blockers
Deprioritize synchronous code optimizations
Use and create small single-purpose modules
Prepare for scale with microservices
Expect to fail, recover quickly

In this post, we’ll be investigating the various ways of managing control
flow with callbacks. The Node-style callback (a.k.a the error-first,
a.k.a the errback) is the recommended chief approach.
While higher level abstractions are certainly worth considering
it's important to have a firm grasp of the trade-offs.
Emulating Native API's

When it comes to JavaScript and the native environments it exists in, reproducing
patterns used by native API's is usually not encouraged, especially not patterns
in the browser.
For instance, the onEvent style from DOM Level 0 (e.g. onClick, onMouseOver, etc.)
isn't a great API to imitate, event-methods can easily be overwritten, or gratuitously
modified by third-parties and registering multiple events requires a custom user-land
implementation of an event stack. A better approach to event management would be via
an event emitter.
Even when new API's are added to the browser, that shouldn't be a signal for users to
adopt these approaches. For instance the comparatively recent WebSockets API did replicate
this approach (onOpen, onMessage).
An example of a Node API we shouldn't reproduce is the simulated virtual function approach
implemented by Node core streams, where the stream creator must sub-class and then supply
_read, _write or _transform methods. This leads to class noise, and requires
extensive explanation via documentation. A better approach is the revealing constructor pattern.
Continuation Passing Style

However, there is one pattern in Node core that absolutely should be reproduced, which
is the way core API's use callbacks.
The humble callback is an implementation of Continuation Passing Style programming (CPS).
A continuation is essentially an operational building block, it's a flow-control primitive.
There are two ways to get data out of a function (without mutating external state that is):
returning a value or passing a value through a continuation (through a callback).
//returning a value
function returnSquare(n) {
  return n * n;
}

//passing a value through a continuation
function cpsSquare(n, cb) {
  cb(n * n);
}
Unlike returning from a function, callbacks allow us to control when the data is
released from the function:
function asyncSquare(n, cb) {
  setTimeout(cb, 1000, n * n);
}
Callback Arity

Since callbacks are functions, we can pass multiple values in as arguments:
//not a great design..., but:
function squareAndCube(n, cb) {
  var sq = n * n;
  cb(sq, sq * n);
}

squareAndCube(10, console.log.bind(console, '10² = %d, 10³ = %d'));
Using multiple parameters for return values is typically a poor design,
for the same reason that functions with lots of parameters are a bad idea:
it demands that humans associate values to indices instead of namespaces.
Emulating named parameters with objects is a better way to return multiple
values (both for return and callbacks):
//better design, but what about errors?
function squareAndCube(n, cb) {
  var sq = n * n;
  cb({square: sq, cube: sq * n});
}
Errors and Values

Whilst using multiple callback arguments for multiple values is considered a
poor design choice, using two arguments to communicate a single value
and error state turns out to be a powerful abstraction.
By definition a continuation allows us to pass state on.
Passing errors through callbacks delegates handling to a consumer.
This is perfect for scenarios where the severity of an error is
determined by it's surrounding context - which is almost all operational
errors, any errors that involve user input, and some forms of developer errors.
function errbackSquareAndCube(n, cb) {
  if (typeof n !== 'number' || !Number.isNaN(n); ) {
    return cb(Error('n must be a number!'));
  }
  var sq = n * n;
  cb(null, {square: sq, cube: sq * n});
}

errbackSquareAndCube(userInputNum, 
  function processResults(err, results) { 
    if (err) { return displayUserError(err); }
    displayAnswers(results);
  });
Using continuations to send both error state and return value also
has significant asynchronous advantages because it's impossible
to catch a throw outside of an asynchronous operation. See
How to Know when (not) to throw for an in depth explanation.
Error First

We could put the error last, but there's a couple of advantages to putting
it first. Primarily it's about inducing positive developer habits,
one of the hardest yet most practical and cost-effective of all design goals.
Placing the error parameter between the developer and the result is a constant
reminder to the developer to handle and propagate errors. If the error parameter
was last it could easily be ignored.
It also removes any need to define a value if there is an error, which can
sometimes act as a kind of failsafe if the error isn't handled. When an expected
value is undefined it usually isn't too long before the process throws upon
attempting to execute undefined or looking up a property on undefined or
generates some unexpected output due to a NaN which at least protects us
from more nefarious bugs like memory leaks or security issues (although a NaN could
feasibly create security hole...).
Core Patterns

The error-first callbacks, sometimes called errbacks, where chosen by core Node API
developers early on. Node was the first project to use this pattern in a
significant way, many core asynchronous operations use the errback signature:
var fs = require('fs');
fs.readFile('./meta.yaml', function outputFile(err, buffer) {
  if (err) { return console.error('oh noes'); }
  console.log(buffer.toString());
});
Synchronous Callbacks

Whilst some of our examples are in fact synchronous operations, the core API
only ever uses callbacks for asynchronous operations. However it may be
worth considering using continuations for all forms. This has two advantages.
First it allows a function to seamlessly evolve form synchronous to asynchronous
without refactoring and secondly it evades inherent problems with throw
and try/catch (see How to Know when (not) to throw). The downside
is the additional boilerplate for synchronous functions, but whilst inconvenient
this may be a worthy trade-off, if such a discipline can be enforced across a team.
Hell.

Continuations are meant to be strung together, replicating the form of core callback
signatures essentially implements a consistent control flow protocol for an application,
allowing for composable encapsulated asynchronous (and synchronous) logic.
However it can lead to code readability and quality issues when used naively:
function findPetsForHuman(id, cb) {

  getPerson({query: id}, function (err, person) {
    if (err) { 
      cb(err); 
    } else {

      findPets({
        species: person.preference.species,
        breeds: person.preference.breeds
      }, function (err, pets) {
        if (err) { 
          cb(err); 
        } else {
          filterPets({
            criteria: person.profile, 
            pets: pets
          }, function (err, matches) {
            if (err) { 
              cb(err); 
            } else {
              if (matches.length > 10) {
                filterPets({
                  criteria: person.preferences.niceToHave,
                  pets: matches,
                  max: 10
                }, function (err, matches) {
                  checkAvailability(matches, function (err, availablePets) {
                    if (err) { 
                      cb(err); 
                    } else {
                      cb(null, availablePets);
                    }
                  });
                });
              } else {
                checkAvailability(matches, function (err, availablePets) {
                  if (err) { 
                    cb(err); 
                  } else {
                    cb(null, availablePets);
                  }
                  
                });
              }
            }
          });
        }

      });
  }

  });

}
The above example is comparatively mild compared to some occurrences in the wild.
As requirements become more complex, heavy use of callbacks leads to rightward syntax creep,
otherwise known as the pyramid of doom or callback hell. However, callbacks per se are not
the source of this problem. It's fundamentally a code organization issue which is easily
fixed by... organizing the code.
function findPetsForHuman(id, cb) {

  getPerson({query: id}, function petMatch(err, person) {
    if (err) { return cb(err); }

    findPets({
      species: person.preference.species,
      breeds: person.preference.breeds
    }, refine);
  });

  function refine (err, pets) {
    if (err) { return cb(err); }

    filterPets({
      criteria: person.profile, 
      pets: pets
    }, respond);

  }

  function respond(err, matches) {
    if (err) { return cb(err); }
    if (matches.length > 10) {
      return filterPets({
        criteria: person.preferences.niceToHave,
        pets: matches,
        max: 10
      }, function culledHandler(err, matches) {
        checkAvailability(matches, cb);
      });
    }

    checkAvailability(matches, cb);

  }

}
We were able to quickly tidy the code up by breaking some of
the callbacks out into function statements. Function statements
are hoisted which allows
us to layout operational logic from top to bottom.
Nesting is also reduced by not using else branches, instead we
can create logical branches by simply returning early from the function
(and it doesn't matter what we return because the values are never used).
We can also pass cb directly to checkAvailability, because cb
is an errback and checkAvailability expects an errback. This is the
principal benefit of establishing a consistent callback contract.
An advantage to breaking out functions is it forces us to name them,
this allows for easier debugging. Having a stack filled with anonymous
functions makes life difficult, so it's best practice to name all
functions. This is why we also named function expressions, not just
those elevated into statements.
See Develop debugging techniques for more about anonymous functions.
Control Flow Patterns

The basic asynchronous unit (the callback) can be wrapped in higher level
control flow patterns to increase code organization and associate semantic
meaning with asynchronous logic. One library that has been particularly
successful in this area is is async.
Our earlier example keeps querying for data based on refined
criteria (for the purpose of explanation, the example is not optimal,
IRL we would probably want to use SQL or MapReduce on the DB side).
The async.waterfall is built for this particular case,
essentially allowing us to break up our logic into asynchronous steps:
function findPetsForHuman(id, cb) {

  async.waterfall([
    function findPetsForHumanStep1(next) { 
      getPerson({query: id}, next);
    },
    function findPetsForHumanStep2(person, next) {
      findPets({
        species: person.preference.species,
        breeds: person.preference.breeds
      }, next);
    },
    function findPetsForHumanStep3(pets, next) {
      filterPets({
        criteria: person.profile, 
        pets: pets
      }, next);
    },
    function findPetsForHumanStep4(matches, next) {
      if (matches.length <=10) { return next(null, matches); }
      filterPets({
        criteria: person.preferences.niceToHave,
        pets: matches,
        max: 10
      }, next);
    },
    checkAvailability //<-- step 5
  ], cb);

}
Notice we're still using the same errback idea, but we don't
have to handle an error parameter in every function, only the
second argument to async.waterfall (where we pass cb)
actually has an error parameter.
The async library is for heavy lifting, and that comes at a price
(abstraction overhead, additional state).
For small single purpose modules it tends not to be necessary unless
there's a lot of asynchronous activity. For application level code
it can be very useful, both client and server side.
Alternative Abstractions

There are other common forms of Continuation Passing Style,
all of which, at an atomic level, use callbacks. Some well
known ones are:

promises
event emitters
streams
generators

Promises allow us to treat logic as an object, we can pass around
a value we don't have yet. Since promises are part of the EcmaScript 2015
standard and are implemented in more recent versions of v8 we'll be
seeing a lot more of them.
Event emitters are part of Node core. Unlike an errback or a promise
event emitters tend to be for communicating multiple values according
to a namespace. This means they don't use errbacks, instead errors
are communicated by calling a function associated with an "error"
namespace:
ee.on('error', function (err) { /* deal it it */ });

We'll be talking about streams in the next 10 tips article,
streams are built on event emitters so they handle errors in
the same way.
Generators are part of EcmaScript 2015, they allow the control
flow of a function to be managed externally by calling next
on an iterator object. The yield keyword is used inside
the generator function to determine step points. For instance
function * g() {  //<-- notice the asterisk
  yield 1;
  yield 2;
  yield 3;
}

var i = g();
console.log(i.next()); // {value: 1, done: false}
console.log(i.next()); // {value: 2, done: false}
setTimeout(function () { 
  console.log(i.next()); // {value: 3, done: false}
  console.log(i.next()); // {value: undefined, done: true}
}, 100);
This isn't that exiting until we consider that since next
can be called at any point, it can be called within a callback.
Therefore it's possibly to build a light abstraction around
generators to provide asynchronous flow control in a synchronous
style... and that's what co does.
For this example, imagine that all the asynchronous calls
return promises:
function findPetsForHuman(id, cb) {
  co(function* () {
    var person = yield getPerson({query: id});
    var pets = yield findPets({
      species: person.preference.species,
      breeds: person.preference.breeds
    });
    var matches = yield filterPets({
      criteria: person.profile, 
      pets: pets
    });
    if (matches.length > 10) {
      matches = yield filterPets({
        criteria: person.preferences.niceToHave,
        pets: matches,
        max: 10
      });
    }

    return yield checkAvailability(matches);  

  })
  .then(function (matches) {
    cb(null, matches);
  })
  .catch(cb);
}
Generators work in Chrome and Firefox, can be enabled in
Node using --harmony flag and are enabled by default in io.js.
Generators with co are a really nice way to organize asynchronous
logic and control the flow, but there is overhead. Both promises
and generators spend a comparatively large amount of time on CPU but
this may not be a problem since the bottleneck will be the asynchronous
operation but it will use more resources.
Combined Approach

There may be a temptation to simultaneously return one value
from a function and pass another through a callback. Whilst it's
a novel idea this is worse than passing multiple value arguments
to a callback because it requires developers to retrieve values
from two sources.
Dual API's

One exception to avoiding the combined approach is to
support both callbacks and promises. The absence of a callback
could be used to signal a promise request instead:
function doAsyncThing(withVal, cb) {
  if (cb instanceof Function) {
    return asyncOp(withVal, cb);
  }
  return new Promise(function (resolve, reject) {
    asyncOp(withVal, function (err, result) {
      if (err) { return reject(err); }
      resolve(result);
    });
  })
}
Conclusion

Ultimately understanding the errback and using it as a simple
unit of asynchrony is an effective way to write JavaScript.
It's a core language construct, and the convention is well known,
using errbacks makes it easy for other developers to interact
with your API's.
Using well known higher level abstractions is fine, but remember
there is a cost to doing so. There should be a strong reason
in the larger context for using a control flow library, or
generators, or event-emitters (and often times there is).
That's all for now, looking forward to seeing you again in
Part 5: Use Streams.