Skip to content

Instantly share code, notes, and snippets.

@pikesley
Forked from mrchrisadams/01.md
Created October 28, 2012 18:25
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pikesley/3969366 to your computer and use it in GitHub Desktop.
Save pikesley/3969366 to your computer and use it in GitHub Desktop.

In my spare time, I've recently been working with a few codebases that either are written in, or use enough code written in, nodejs, that make me keen to have some kind of testing framework in place to help put the same kinds of safety nets in place that I'm used to working with on Chef, Sinatra or Rails projects.

After losing a couple of weekends to trying to find an approach to BDD style development with node, and getting my head around asynchronous coding concepts, I think I've settled on an approach that feels enough like rspec to feel comfortable enough to use for future serverside js development.

It's way too much for a single post, so I'll be sharing the first of a three part series of posts, to help other Ruby developers used to synchronous development with rspec, adjust to asynchronous development, with the closest thing I can find to rspec right now, mocha.

I'll cover how Mocha syntax compares to rspec, then I'll cover implementing the code to pass these mocha specs, then I'll add a post to help keep asynchronous code halfway manageable in node.

How we do it in Ruby

I'm going to use a side project I've been hacking on for a few months to show how I'd add a new class, to wrap its calls to a persistence layer, to provide a degree of encapsulation, abstracting away the database technology from external interface for the class.

Let's say all I want to do here with a User class is have a method that finds me a user, stored as a hash in Redis, keyed on their machine's mac addess.

The tests I'd write might look a bit like this pseudocode here:

describe 'User' do

  before(:each) do 
    redis = Redis.new
    redis.hmset("99:aa:44:33:01:3r", {
      :username    => "mrchrisadams",
      :name        => "Chris"
      :email       => "wave@chrisadams.me.uk",
      :mac_address => "99:aa:44:33:01:3r"
    })
  end

  it 'fetches the user object' do 
    u = User.new
    c = u.find_by_mac('99:aa:44:33:01:3r')
    c.name.should be('mrchrisadams')
  end

end

I use the before block to store a hash inside redis, setting a few extra values on it, and then later on, in the it 'fetches the user object' block, I instantiate an instance of my user class, and call the find_by_mac method to fetch me the hash I just stored in Redis.

The implementation code in Ruby, might look like this:

class User do

  def initialize
    @db = Redis.new
  end

  def find_by_mac(mac)
    @db.hgetall(mac) 
  end

end

So far so good - this is synchronous code - it feels comfortable, and is easy enough to work with.

Doing this in node

Now, let's try to take the same approach in node, to see how different this looks, but also to see what we need to be aware of when learning to think in asynchronous terms. the completed mocha test code is here on github, and the completed implementation code is here too

So, lets be good developers and try to write out test code first, in Mocha, the javascript flavoured take on rspec. I'll paste the lot, then go through the interesting bits piece by piece.

describe('User', function() {
  describe('#findByDevice', function() {

    beforeEach(function(done) {

      db.hmset("mrchrisadams", {
        name: "Chris Adams",
        username: "mrchrisadams",
        devices: ["00:1e:c2:a4:d3:5e"],
        email_address: "wave@chrisadams.me.uk"
      }, done);

    })


    it('should fetch the user for that mac', function(done) {
      var user = new User();
      user.findByDevice('00:1e:c2:a4:d3:5e', function(err, res) {
        if (err) {
          console.log(err)
        } else {
          // console.log(res)
          res.username.should.be.ok
          res.username.should.equal('mrchrisadams')
          done()
        }
      }); // find by device
    }) // should fetch the user for that mac
  })
})

So first of all, much of the syntax is somewhat familiar. We have nested describe and it blocks, and event the assertion syntax is reassuringly familiar with nice, readable shoulds, bes and equals around.

However, there are a few important additions here that we need to allow for the asynchronous nature of node. First of all, lets look at the beforeEach function.

beforeEach(function(done) {

  db.hmset("mrchrisadams", {
    name: "Chris Adams",
    // object vars edited out for brevity
  }, done);
})

In this case, we're making a call to node-redis a popular redis library for node, that is completely asynchronous. We could have tried using the bog-standard beforeEach function like this, when working with an asynchronous library (note the lack of done):

  beforeEach(function() {
    db.hmset("mrchrisadams", {
      name: "Chris Adams",
      // object vars edited out for brevity
    });
  })

Had we done this, node would have zipped to the beforeEach function, started it, then returned straight away, racing ahead trying to run the tests below it, without waiting for our Redis setup steps to be finished. Now Redis is fast, but you can't rely on that to make sure your tests are set up before you run them, and this code would have given us at best unpredictable results, but more likely, fails across the board.

done to the rescue

Here's how we do it when working with async library:

beforeEach(function(done) {

  db.hmset("mrchrisadams", {
    name: "Chris Adams",
    // object vars edited out for brevity
  }, done);
})

The difference this time round is that we're passing in done, a function that exists to stop the tests running until Redis setup steps are finished, and we're in a state for testing.

We have to take this approach for the tests itself in our it function:

it('should fetch the user for that mac', function(done) {
  var user = new User();
  user.findByDevice('00:1e:c2:a4:d3:5e', function(err, res) {
    if (err) {
      // do something to recover
    } else {
      res.username.should.be.ok
      res.username.should.equal('mrchrisadams')
      done()
    }
  }); // close findByDevice
}) // close should fetch the user for that mac

Here, we're doing something very different to the ruby approach of storing returned values from methods in variables, then testing the value of those.

Our first real exposure to "continuation-passing style"

Look at this line in paticular:

user.findByDevice('00:1e:c2:a4:d3:5e', function(err, res)

Understanding this here for me was the key to getting my head around this initially very alien syntax, and if you're having trouble with the shift from sync to async, the closest thing in typical sync ruby code might be something like, where you set the varibale res then use it for testing assertions against:

res = findByDevice('00:1e:c2:a4:d3:5e)
res.should be_okay

Now there's two important things here to remember when working with node:

  1. Because we're working asynchronously, we only want to run out assertions once we know we have the values back from the call we just made
  2. When we're working with javascript, we can pass functions around to execute the code inside them, at a later date.

So, our solution to the asynchronous problem, is to pass in a function with the assertions we care about inside it, as a parameter to our findByDevice call on the user object.

So, what we're saying here is, "go fetch the results of findByDevice, with the paramters 00:1e:c2:a4:d3:5e, and here's the function I'd like you to execute when you're done, please":

  user.findByDevice('00:1e:c2:a4:d3:5e', function(err, res) {
    if (err) {
      // do something to recover
    } else {
      res.username.should.be.ok
      res.username.should.equal('mrchrisadams')
      done()
    }
  })

You might be confused by the two paramters err, and res. These are a generally accepted convention when coding asynchronously in node, to make it possible to pass the result from one call back to another. Passing in a function which itself has the parameters error, and result (or some variation on the name) as the last argument going into a method call is a generally accepted convention with node now, and is often referred to as the continuation-passing style. It's crucial to understand it, because you won't get far without it.

Where does done() fit into this?

You might notice a call to done() on the last line of the function we're passing in, after the assertions. Mocha, when you pass in done to a testing block, doesn't known when the test is passed or failed, so will wait for done() to be called, before deciding if that particular it testing block has failed or passed.

Now onto the implementation in node

So we've run through how Mocha works now, and how it compares to Rspec, and we've seen how we rely on anonymous functions to run our assertions on the results of asynchronous functions. (i.e. anonymous functions are functions with no name, just the keyword, parameters, the code to execute like - function(err, res) { // do stuff } ).

It's worth re-reading the above, until you're really comfortable with the concepts, as the next section is unfortunately pretty messy.

This is the second of the three part series covering how to migrate from developing synchronously with Rails and Rspec, to asynchronously with Node, and Mocha. It picks up from the previous post, introducing Mocha syntax, and asynchronous testing.

We've covered before how we'd implement a class with instance methods in Ruby in the previous post. Here's the simplified psuedo-code, for comparison:

class User do

  def initialize
    @db = Redis.new
  end

  def find_by_mac(mac)
    @db.hgetall(mac) 
  end

end

It looks a bit different when working with asynchronous node code.

What implementation in node looks like

Because javascript doesn't have a class system, if we want something that acts a but like a class, the idiomatic approach is to use functions, and use a bit of boilerplate code to make it easier to identify the function in stacktraces or logging when developing.

In javascript, instead of defining class methods or instance methods like we do with Ruby, the idiomatic approach here is to use prototype, to inject new methods into the User function, so they're available to all instances of the User function in the system.

This chunk of code below is roughly analagous to declaring a User class in ruby, mixing in methods from an EventEmitter module and giving it a to_s method, so there's a readable string returned when you try to log the class, or print it:

function User() {
  if(false === (this instanceof User)) {
    return new User();
  }
  events.EventEmitter.call(this);
}
sys.inherits(User, events.EventEmitter);

User.prototype.toString = function() {
  return "User"
}

One thing - User.prototype.toString is synchronous because we can rely on it returning a value instantaneously , without thinking about callbacks.

Writing asynchronous functions

However, when we're working with asynchronous functions, we need to know how to define them as well as call them.

Here's a simplified version of an asynchronous function in use in the User function. We have defined a function on the prototype of User, accepting two parameters: device_mac a String we use as our key when fetching a hash with Redis, and callback the function we want to pass into findByDevice for later execution when Redis gives us our hash to execute operations on when it's done. In line with convention, our function callback itself takes two parameters, err and res - in our case, res is the hash given to us by Redis, if all is well, and err is what we get if something goes wrong with Redis when it's fetching our hash for us.

User.prototype.findByDevice = function(device_mac, callback) {
  db.hgetall(device_mac, function (err, res) {
    callback(null, res);
  })
})

Checking this against our test code

It might be helpful to show these side by side, to put the implemented function on User, next to the function we're passing with our test to see what it is we're passing into findByDevice:

Our implemented function:

User.prototype.findByDevice = function(device_mac, callback) {
  db.hgetall(device_mac, function (err, res) {
    callback(err, res);
  })
})

The test:

user.findByDevice('00:1e:c2:a4:d3:5e', function(err, res) {
  if (err) {
    // do something to recover
  } else {
    res.username.should.be.ok
    res.username.should.equal('mrchrisadams')
    done()
  }
})

When we have the value from Redis, callback(err, res) is executing the function below, with our res.username.should.be.ok type assertions.

When you need to call async functions from async functions

Once you've got your head around passing functions for asynchronous code, you'll often find yourself working with multiple asynchronous functions, that you need to control the order of, so that data is passed from one to the other, to give you the result you want. Here's the actual implementation of the findByDevice I ended up using in the project I'm working on. We still pass in the callback function as our final parameter, but in order to return the value we want, we end up nesting a second call to db.hgetall inside the anonymous function we pass into our first call of db.hgetall, then using callback(err, user) to execute the function passed into findByDevice, and pass the results along to the code initially calling user.findByDevice.

  User.prototype.findByDevice = function(device_mac, callback) {

    db.hgetall(device_mac, function (err, device) {
      if (err) { console.log(err) }
      else {
        if (device.hasOwnProperty('mac')) { 
          db.hgetall(device.owner, function (err, user) {
            if (err) { console.log(err)}
            else{
              callback(err , user)
            }
          });
        }
      }
    });
  }

Avoiding callback hell.

Even with just two asynchronous function calls, this isn't very readable, and it seems that nearly every second developer on the planet playing with node has written their own callback handling library to make this easier to read and more maintainable.

In fact there's a bewildering number of libraries out there that claim to make this problem much easier to understand. In the next post, I'll introduce async.js a well documented library I've found fairly straightforward to work with, to help mitigate against callback hell.

In the last post, I implemented an asynchronous function that wrapped a call to Redis, using an existing node library, node-redis. The final implementation introduced nested asynchronous method calls, and the code ended up looking a bit like this, even after simplifying somewhat:

  User.prototype.findByDevice = function(device_mac, callback) {

    db.hgetall(device_mac, function (err, device) {
      if (err) { console.log(err) }
      else {
        if (device.hasOwnProperty('mac')) { 
          db.hgetall(device.owner, function (err, user) {
            if (err) { console.log(err)}
            else{
              callback(err , user)
            }
          });
        }
      }
    });
  }

Now the code took this form, because we relied on the results of one asynchronous function to make the second one - if you take a second to imagine how hard to read this would look at four, five or six levels of nesting, you'll quickly understand why so many developers are writing their own callback management libraries to make this easier to work with.

The one I've found most promising so far is async.js, a fairly comprehensive utility module that provides a number of different ways to ensure that asynchronous functions are either called in a specific order, or run in parallel, before aggregating their results before allowing code to continue and so on.

In this case, I'll be focusing on the use of waterfall, a function in the async module that lets you pass in an array of functions to be called in order, passing the results of one to the next, until a final callback passes the final result on to the code initially calling the function async was called from within.

  User.prototype.findByDevice = function(device_mac, callback) {

    async.waterfall([

        // fetch our device first
        function(cb){
          db.hgetall(device_mac, function (err, res) {
            cb(null, res);
          })
        },

        // new we have our device, fetch the user
        function(device, cb){
          db.hgetall(device.owner, function (err, res) {
            cb(err, res);
          })
        }

        // return our user object
      ], function (err, user) {
        callback(err, user)
      });

  }

In our case, we have our function findByDevice on User, and we have passed an array containing our two asynchronous functions as the first argument to async, then passing a final anonymous function to return our user object.

To be more specific, just like the code above, we take our mac address string as the first parameter to findByDevice, and the function to execute as our second parameter, callback. We then make the asynchronous call to Redis to fetch a device object, passing in cb as our function to execute once Redis has given us our hash, to pass it to the next function in the array.

We then use the owner property of the device object passed into the second function, to make another call to fetch our user, again passing in cb, to execute once Redis has given us a user object, to pass to the final function.

Once we have the user object, we can pass it on to the code that called findByDevice with callback(err, user), completing the asynchronous callback chain.

More than just waterfalls

Of course, just because we now know how to execute asynchronous functions in a set order, one after the other like we're used to doesn't mean we should always do so.

One of the advantages of node's asynchronous style is that it allows the parallel execution of code, so the same operations could be applied to the an array of values at the same time, getting around bottlenecks, but then only passing on the results once all the operations have been completed.

Alternatively, this allows us to pop values onto queues, with set numbers of workers, to work through them, without needing a dedicated worker process like you would with in Rails for, when using delayed_job or resque.

Doesn't all this seem like a lot of work though? The ruby you showed me first was much shorter, and easier to read

In a word, yes.

Node isn't a magic bullet, and although it's popular, if you're doing a basic CRUD app, there are often very good reasons to choose Rails, Django over Node and Express.

That being said, it pays to understand your options when choosing a particular technology to solve the problem facing you, and if you're a fan of behaviour driven development, that such an approach is possible. Also, once you've got your head around async programming, it's good to knwo that there some well developed tools to help you apply these techniques to both server, and client side javascript.

If anything's not clear in this series, please let me know - I've sunk a good few hours into these posts now, to make it easier to understand async node development if you're used to sync ruby development, and I'd really like to know where I can improve these for future visitors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment