Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save othiym23/396026f74f0ae36682784e3090684d49 to your computer and use it in GitHub Desktop.
Save othiym23/396026f74f0ae36682784e3090684d49 to your computer and use it in GitHub Desktop.
CascadiaJS 2014: Continuation Local Storage (CLS) slides

[fit] constantly losing stuff

lessons learned developing and using continuation-local-storage in Node

Forrest L Norvell, npm, Inc.

ponytime othiym23 on npm / GitHub / Twitter

^ code-heavy talk that focuses on Node; some interesting pieces for the front end towards the end


Why did I make this weird thing with the long name?

^ writing a monitoring tool for New Relic; needed a way to have access to request-specific data that would work across the whole request; took many, many tries to get it right


What is continuation-local storage?

  • need variables scoped across concurrent requests
  • threaded programming uses "thread-local" storage
  • Node's JavaScript is threadless, uses callbacks instead

Why are some callbacks called continuations?

  • async functions get passed a callback
  • the callback is how processing continues
  • callbacks used this way are called continuations

^ this convention is called continuation-passing style


Talking about music is like dancing about architecture. -- Gandhi

^ CLS is actually easier to use than describe, soooo let's talk in terms of real Node apps; quote from Martin Mull, a comedy / music artist from the 70s (probably)


So, you're writing an Express application.


You have a middleware that fetches a session:

module.exports = function getSession(req, res, next) {
  var id = req.params['user_id']
  var store = require("path").resolve(process.cwd() "./data/sessions.json")
  require("fs").readFile(store, function (error, data) {
    if (error) return next(error)
    var sessions
    try { sessions = JSON.parse(data) }
    catch (e) { return next(e) }
    if (sessions[id]) res.locals._session = sessions[id]
    next()
  })
}

^ next slide for emphasis on res.locals


You have a middleware that fetches a session:

  // put the session on res.locals
  if (sessions[id]) res.locals._session = sessions[id]

^ default solution in Express; slaps the data onto the ServerResponse object; causes weirdness if data are weird: a response local that is cyclical will crash loggers & debuggers


And you have a route handler:

// ./routes/hello.js
module.exports = function hello(req, res, next) {
  var name = res.locals._mySession && 
             res.locals._mySession.name || 
             "world"
  res.render("hello", {name : name}, rendered)
  
  function rendered(error, partial) {
    if (error) return next(error)
    res.send(partial)
  }
}

^ next slide for emphasis on res.locals


And you have a route handler:

  // grab session data off res.locals
  var name = res.locals._mySession && 
             res.locals._mySession.name || 
             "world"

^ res has to be in scope


[fit] That works OK, I guess.

(But it's kind of gross.)

^ brittle: res.locals needs to be visible to all of the code that requires this data; modifies objects you didn't create; can make it hard to recover from early errors


But now you want to send data elsewhere:

// ./routes/checkout.js
var fulfill = require("../services/fulfill.js")
module.exports = function checkout(req, res, next) {
  fulfill(res.locals._mySession['cart'], next)
}
// ./services/fulfill.js
var Orders = require("../models/orders.js")
module.exports = function fulfill(cart, next) {
  Orders.save(cart, function (error) {
    // hmm
  })
}

^ how to send back success? how to add request-specific logging? what happens when some of the in-between pieces aren't modifiable by us? workarounds exist, but they either bulk up the code or increase coupling


[fit] Things are getting complicated pretty fast.

^ there are lots of moving parts to a modern server, and a lot of the code wasn't written by you; many distinct but interdependent concerns; can't presume that res is available; must work with many requests in flight at once;


[fit] I had this problem too.

^ New Relic needed tracing; couldn't break developers' apps; needed to be lightweight and safe


So I* built continuation-local-storage.

// app.js
var cls = require("continuation-local-storage")
var ns = cls.createNamespace("mine")

function namespaced(req, res, next) { ns.run(next) }
function setSession(req, res, next) {
  ns.set("session", loadSession(req.params["user_id"]))
}
app.use(namespaced)
app.use(setSession)

// db-elsewhere.js
var ns = require("continuation-local-storage").getNamespace("mine")
function myUserFetcher(cb) {
  var userId = ns.get("session")["user_id"] //etc
}

^ along with Tim Caswell, Trevor Norris, Isaac Schlueter, Jacob Groundwater namespaces global to a process separate namespace creation & lookup must run an async call chain before using set & get


The name is kind of goofy for a reason.

  • Domains have a terrible name.
  • Continuation-local storage is like thread-local storage.
  • Package names don't need to be terse.

Please use more descriptive names for packages, everybody!

^ nobody likes the "domains" name (aka hugs, guardians); explain thread-local storage; seriously, you don't type them that often; remember: discoverability is npm's biggest problem!


How does CLS work?

  • CLS is simple
  • ~150 lines of JavaScript
  • because it doesn't do most of the work
  • the work is done by the asyncListener API

How does CLS asyncListener work?

  • CLS asyncListener is simple
  • it “just” monkeypatches everything asynchronous in Node core
  • two versions: one in Node 0.11, one polyfill

^ not going to be exposed in 0.12; I maintain the polyfill; it doesn't track the latest version of Trevor's API


A simple illustration of asyncListener: stackup

Everybody loves a long stack trace. Running:

require("stackup")
var empty
process.nextTick(function () { empty.nonexistent() })

^ common pattern: these modules modify runtime state; example supposed to fail


yields

TypeError: Cannot call method 'nonexistent' of undefined
    at /Users/ogd/Documents/projects/async-listener-test/test.js:3:38
    at process._tickCallback (node.js:419:13)
    at Function.Module.runMain (module.js:499:11)
    at startup (node.js:119:16)
    at node.js:906:3
    ---- async ----
    at Object.<anonymous> (/Users/ogd/Documents/projects/async-listener-test/test.js:3:9)
    at Module._compile (module.js:456:26)
    at Object.Module._extensions..js (module.js:474:10)
    at Module.load (module.js:356:32)
    at Function.Module._load (module.js:312:12)
    at Function.Module.runMain (module.js:497:10)
    at startup (node.js:119:16)
    at node.js:906:3

^ long traces are short traces stapled together; asyncListener does the stapling; asyncListener observes before and after the stack is reset


[fit] Aaanyway…

^ full discussion of asyncListener is out of scope final design is under discussion watch discussion of tracing module post-0.12


[fit] Back to CLS!

gets and sets values as if they were global, but scoped to each chain of asynchronous function calls.

^ no control flow or error-handling, just state propagation


The API is dirt simple:

var cls = require("continuation-local-storage")
var namespace = cls.createNamespace("my namespace name") // create
namespace = cls.getNameSpace("my namespace name") // use
// wrap async chain
namespace.run(function () {
  namespace.set("property", "hi")
  doSomethingAsync(function () {
    var property = namespace.get("property")
  })
})

^ has operations to create and get namespaces; use whatever namespace names you want; the main thing is to avoid collisions; also has operations to run code in CLS; and get and set values on the namespace


(The API looks like Node's domains because it's based on them.)


What is CLS for?

  • transaction tracing
  • passing data through complicated webapps
  • logging
  • working with third-party async code

^ there are plugins for use with Express, Restify, and hapi; people should write more!


Remember when I said, “patches everything asynchronous in Node core?

^ there are some edge cases; the most troublesome things to support are connection pools and pipelines


Sometimes you can use shims:

  • cls-q
  • cls-redis
  • ??? (should write more)

^ in particular, bluebird needs support, but it's suuuper scary


Sometimes you have to bind callbacks to namespaces explicitly:

var cls = require("continuation-local-storage")
var namespace = cls.getNamespace("my namespace name")
myMongoDBMeatFinder({type : "ham"}, namespace.bind(next)) // <-- !!

function next(error, hams) {
  var logger = namespace.get("logger")
  if (!hams.length) return logger.yo("ENOHAM")

  var res = namespace.get("res")
  res.send(hams)
}

Sometimes you have to bind callbacks to namespaces explicitly:

myMongoDBMeatFinder({type : "ham"}, namespace.bind(next)); // <-- !!

EventEmitters are another special case:

// handler.js
var detect = require("./elsewhere.js")
module.exports = function handler(req, res, next) {
  ns.set("res", res)
  req.on("data", detect)
  ns.bindEmitter(req)
}

// elsewhere.js
module.exports = function detect(data) {
  if (data.toString("utf8") === ns.get("sentinel")) {
    ns.get("res").end()
  }
}

EventEmitters are another special case:

  ns.bindEmitter(req)

Why?

Because EventEmitters, connection pools, and pipelining break the asynchronous call chain.

^ The async chain can't be fixed up automatically. It needs hints from a human who knows where these gaps are: you!


Let's get real.

^ built to solve production problems; not without tradeoffs; the core team didn't want it inside Node because of these tradeoffs


Is it safe?

  • “some” CPU overhead
  • more GC pressure & memory overhead
  • doesn't change the behavior of your program or Node itself

^ CPU overhead can be relatively large, but vanishes below noise floor as soon as I/O is involved; memory overhead is more significant, but really depends on app; monkeypatching tested for safety


Who's using this thing?

  • every user of New Relic for Node
  • small but (mostly) satisfied community of other users

What else is out there?

^ hey browser people, you can wake up now!


If you like browsers, there's Angular's zone.js:

zone.run(function () {
  zone.inTheZone = true;

  setTimeout(function () {
    console.log('in the zone: ' + !!zone.inTheZone);
  }, 0);
});

console.log('in the zone: ' + !!zone.inTheZone);

^ Bryant Fjord is pretty good at browser JavaScript!


If you only want error monitoring, there's domains:

var domain = require("domain");

app.use(function (req, res, next) {
  var d = domain.create();
  d.on("error", next);
  d.run(next);
});

If you want an entire paradigm, there's StrongLoop Zones for Node:

require("zone").enable();

function curl(url, cb) {
  zone.create(function () {
    var data = "";
    require("http").get(url, function (res) {
      res.setEncoding("utf8");

      res.on("data", function (s) { data += s;});
      res.on('end', function () { zone.return(data); });
    });
  }).setCallback(cb);
}

curl('http://www.google.com/', console.log);

^ automatically cleans up after errors ^ adds promise-like control flow ^ doesn't really do state management


In the future, there might be dynamic tracing (like DTrace) in Node with a JavaScript API:

var tracing = require("tracing");
// ???

[watch this space]


What's next?


There's not much left to do except make it simpler.

  • automatically load shims
  • adopt the final (TBD) asyncListener API
  • transparent support for more packages
  • rewriting the documentation

…coming “soon!”

^ asyncListener API isn't done yet; shims for Bluebird, MongoDB


Thanks, and good luck!

@rohitrikhi
Copy link

Good Work...Much Appreciated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment