othiym23/cascadiajs-2014-continuation-local-storage.md

## cascadiajs-2014-continuation-local-storage.md

      
    Raw
  

              cascadiajs-2014-continuation-local-storage.md
            
          
    [fit] constantly losing stuff

lessons learned developing and using continuation-local-storage in Node

Forrest L Norvell, npm, Inc.

 othiym23 on npm / GitHub / Twitter

^ code-heavy talk that focuses on Node;
some interesting pieces for the front end towards the end

Why did I make this weird thing with the long name?

^ writing a monitoring tool for New Relic;
needed a way to have access to request-specific data that would work across the whole request;
took many, many tries to get it right

What is continuation-local storage?


need variables scoped across concurrent requests
threaded programming uses "thread-local" storage
Node's JavaScript is threadless, uses callbacks instead


Why are some callbacks called continuations?


async functions get passed a callback
the callback is how processing continues
callbacks used this way are called continuations

^ this convention is called continuation-passing style


Talking about music is like dancing about architecture.
-- Gandhi

^ CLS is actually easier to use than describe, soooo let's talk in terms of real Node apps;
quote from Martin Mull, a comedy / music artist from the 70s (probably)

So, you're writing an Express application.


You have a middleware that fetches a session:

module.exports = function getSession(req, res, next) {
  var id = req.params['user_id']
  var store = require("path").resolve(process.cwd() "./data/sessions.json")
  require("fs").readFile(store, function (error, data) {
    if (error) return next(error)
    var sessions
    try { sessions = JSON.parse(data) }
    catch (e) { return next(e) }
    if (sessions[id]) res.locals._session = sessions[id]
    next()
  })
}
^ next slide for emphasis on res.locals

You have a middleware that fetches a session:

  // put the session on res.locals
  if (sessions[id]) res.locals._session = sessions[id]
^ default solution in Express;
slaps the data onto the ServerResponse object;
causes weirdness if data are weird:
a response local that is cyclical will crash loggers & debuggers

And you have a route handler:

// ./routes/hello.js
module.exports = function hello(req, res, next) {
  var name = res.locals._mySession && 
             res.locals._mySession.name || 
             "world"
  res.render("hello", {name : name}, rendered)
  
  function rendered(error, partial) {
    if (error) return next(error)
    res.send(partial)
  }
}
^ next slide for emphasis on res.locals

And you have a route handler:

  // grab session data off res.locals
  var name = res.locals._mySession && 
             res.locals._mySession.name || 
             "world"
^ res has to be in scope

[fit] That works OK, I guess.

(But it's kind of gross.)

^ brittle: res.locals needs to be visible to all of the code that requires this data;
modifies objects you didn't create;
can make it hard to recover from early errors

But now you want to send data elsewhere:

// ./routes/checkout.js
var fulfill = require("../services/fulfill.js")
module.exports = function checkout(req, res, next) {
  fulfill(res.locals._mySession['cart'], next)
}
// ./services/fulfill.js
var Orders = require("../models/orders.js")
module.exports = function fulfill(cart, next) {
  Orders.save(cart, function (error) {
    // hmm
  })
}
^ how to send back success?
how to add request-specific logging?
what happens when some of the in-between pieces aren't modifiable by us?
workarounds exist, but they either bulk up the code or increase coupling

[fit] Things are getting complicated pretty fast.

^ there are lots of moving parts to a modern server, and a lot of the code wasn't written by you;
many distinct but interdependent concerns;
can't presume that res is available;
must work with many requests in flight at once;

[fit] I had this problem too.

^ New Relic needed tracing;
couldn't break developers' apps;
needed to be lightweight and safe

So I* built continuation-local-storage.

// app.js
var cls = require("continuation-local-storage")
var ns = cls.createNamespace("mine")

function namespaced(req, res, next) { ns.run(next) }
function setSession(req, res, next) {
  ns.set("session", loadSession(req.params["user_id"]))
}
app.use(namespaced)
app.use(setSession)

// db-elsewhere.js
var ns = require("continuation-local-storage").getNamespace("mine")
function myUserFetcher(cb) {
  var userId = ns.get("session")["user_id"] //etc
}
^ along with Tim Caswell, Trevor Norris, Isaac Schlueter, Jacob Groundwater
namespaces global to a process
separate namespace creation & lookup
must run an async call chain before using set & get

The name is kind of goofy for a reason.


Domains have a terrible name.
Continuation-local storage is like thread-local storage.
Package names don't need to be terse.

Please use more descriptive names for packages, everybody!

^ nobody likes the "domains" name (aka hugs, guardians);
explain thread-local storage;
seriously, you don't type them that often;
remember: discoverability is npm's biggest problem!

How does CLS work?


CLS is simple
~150 lines of JavaScript
because it doesn't do most of the work
the work is done by the asyncListener API


How does CLS asyncListener work?


CLS asyncListener is simple


it “just” monkeypatches everything asynchronous in Node core
two versions: one in Node 0.11, one polyfill

^ not going to be exposed in 0.12;
I maintain the polyfill;
it doesn't track the latest version of Trevor's API

A simple illustration of asyncListener: stackup

Everybody loves a long stack trace. Running:
require("stackup")
var empty
process.nextTick(function () { empty.nonexistent() })
^ common pattern: these modules modify runtime state;
example supposed to fail

yields
TypeError: Cannot call method 'nonexistent' of undefined
    at /Users/ogd/Documents/projects/async-listener-test/test.js:3:38
    at process._tickCallback (node.js:419:13)
    at Function.Module.runMain (module.js:499:11)
    at startup (node.js:119:16)
    at node.js:906:3
    ---- async ----
    at Object.<anonymous> (/Users/ogd/Documents/projects/async-listener-test/test.js:3:9)
    at Module._compile (module.js:456:26)
    at Object.Module._extensions..js (module.js:474:10)
    at Module.load (module.js:356:32)
    at Function.Module._load (module.js:312:12)
    at Function.Module.runMain (module.js:497:10)
    at startup (node.js:119:16)
    at node.js:906:3

^ long traces are short traces stapled together;
asyncListener does the stapling;
asyncListener observes before and after the stack is reset

[fit] Aaanyway…

^   full discussion of asyncListener is out of scope
final design is under discussion
watch discussion of tracing module post-0.12

[fit] Back to CLS!

gets and sets values as if they were global, but scoped to each chain of asynchronous function calls.

^ no control flow or error-handling, just state propagation

The API is dirt simple:

var cls = require("continuation-local-storage")
var namespace = cls.createNamespace("my namespace name") // create
namespace = cls.getNameSpace("my namespace name") // use
// wrap async chain
namespace.run(function () {
  namespace.set("property", "hi")
  doSomethingAsync(function () {
    var property = namespace.get("property")
  })
})
^ has operations to create and get namespaces;
use whatever namespace names you want;
the main thing is to avoid collisions;
also has operations to run code in CLS;
and get and set values on the namespace

(The API looks like Node's domains because it's based on them.)


What is CLS for?


transaction tracing
passing data through complicated webapps
logging
working with third-party async code

^ there are plugins for use with Express, Restify, and hapi;
people should write more!

Remember when I said, “patches everything asynchronous in Node core?”

^ there are some edge cases;
the most troublesome things to support are connection pools and pipelines

Sometimes you can use shims:


cls-q
cls-redis
??? (should write more)

^ in particular, bluebird needs support, but it's suuuper scary

Sometimes you have to bind callbacks to namespaces explicitly:

var cls = require("continuation-local-storage")
var namespace = cls.getNamespace("my namespace name")
myMongoDBMeatFinder({type : "ham"}, namespace.bind(next)) // <-- !!

function next(error, hams) {
  var logger = namespace.get("logger")
  if (!hams.length) return logger.yo("ENOHAM")

  var res = namespace.get("res")
  res.send(hams)
}

Sometimes you have to bind callbacks to namespaces explicitly:

myMongoDBMeatFinder({type : "ham"}, namespace.bind(next)); // <-- !!

EventEmitters are another special case:

// handler.js
var detect = require("./elsewhere.js")
module.exports = function handler(req, res, next) {
  ns.set("res", res)
  req.on("data", detect)
  ns.bindEmitter(req)
}

// elsewhere.js
module.exports = function detect(data) {
  if (data.toString("utf8") === ns.get("sentinel")) {
    ns.get("res").end()
  }
}

EventEmitters are another special case:

  ns.bindEmitter(req)

Why?

Because EventEmitters, connection pools, and pipelining break the asynchronous call chain.

^ The async chain can't be fixed up automatically. It needs hints from a human who knows where these gaps are: you!

Let's get real.

^ built to solve production problems;
not without tradeoffs;
the core team didn't want it inside Node because of these tradeoffs

Is it safe?


“some” CPU overhead
more GC pressure & memory overhead
doesn't change the behavior of your program or Node itself

^ CPU overhead can be relatively large, but vanishes below noise floor as soon as I/O is involved;
memory overhead is more significant, but really depends on app;
monkeypatching tested for safety

Who's using this thing?


every user of New Relic for Node
small but (mostly) satisfied community of other users


What else is out there?

^ hey browser people, you can wake up now!

If you like browsers, there's Angular's zone.js:

zone.run(function () {
  zone.inTheZone = true;

  setTimeout(function () {
    console.log('in the zone: ' + !!zone.inTheZone);
  }, 0);
});

console.log('in the zone: ' + !!zone.inTheZone);
^ Bryant Fjord is pretty good at browser JavaScript!

If you only want error monitoring, there's domains:

var domain = require("domain");

app.use(function (req, res, next) {
  var d = domain.create();
  d.on("error", next);
  d.run(next);
});

If you want an entire paradigm, there's StrongLoop Zones for Node:

require("zone").enable();

function curl(url, cb) {
  zone.create(function () {
    var data = "";
    require("http").get(url, function (res) {
      res.setEncoding("utf8");

      res.on("data", function (s) { data += s;});
      res.on('end', function () { zone.return(data); });
    });
  }).setCallback(cb);
}

curl('http://www.google.com/', console.log);
^ automatically cleans up after errors
^ adds promise-like control flow
^ doesn't really do state management

In the future, there might be dynamic tracing (like DTrace) in Node with a JavaScript API:

var tracing = require("tracing");
// ???
[watch this space]

What's next?


There's not much left to do except make it simpler.


automatically load shims
adopt the final (TBD) asyncListener API
transparent support for more packages
rewriting the documentation

…coming “soon!”
^ asyncListener API isn't done yet;
shims for Bluebird, MongoDB

Thanks, and good luck!