public
Last active

streamline vs. callbacks bench

  • Download Gist
benchCallbacks.js
JavaScript
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
"use strict";
var fs = require('fs');
var cache = {}, hit = 0, missed = 0;
 
function load(name, cb) {
var res = cache[name];
if (res) {
process.nextTick(function() {
hit++;
cb(null, res);
});
} else {
fs.readFile(name, function(err, data) {
missed++;
cb(null, cache[name] = data);
});
}
}
 
var count = 1000000;
 
function bench(cb) {
var total = 0;
function loop(i) {
if (i === count) {
cb(null, total);
} else {
load(__dirname + '/benchCallbacks.js', function(err, data) {
if (err) return cb(err);
total += data.length;
loop(i + 1);
})
}
}
loop(0);
}
 
 
var t0 = Date.now();
bench(function(err, result) {
if (err) throw err;
console.log('hit=' + hit + ', missed=' + missed + ', result=' + result);
console.log('elapsed: ' + (Date.now() - t0));
});
benchStreamline._js
JavaScript
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
"use strict";
var fs = require('fs');
var cache = {}, hit = 0, missed = 0;
 
function load(name, _) {
var res = cache[name];
if (res) {
hit++;
return res;
} else {
missed++;
return cache[name] = fs.readFile(name, _);
}
}
 
var count = 1000000;
 
function bench(_) {
var total = 0;
for (var i = 0; i < count; i++) {
var res = load(__dirname + '/benchCallbacks.js', _);
total += res.length;
}
return total;
}
 
var t0 = Date.now();
var result = bench(_);
console.log('hit=' + hit + ', missed=' + missed + ', result=' + result);
console.log('elapsed: ' + (Date.now() - t0));

Results (macbook pro / lion):

$ _node --fibers benchStreamline
hit=999999, missed=1, result=798000000
elapsed: 325
$ _node benchStreamline
hit=999999, missed=1, result=798000000
elapsed: 1705
$ node benchCallbacks
hit=999999, missed=1, result=798000000
elapsed: 3252

Why process.nextTick in the plain JS example? If you document it correctly, it should work without, too.

if (err) cb(err); in plain JS looks wrong to me - why no return?

Try it without process.nextTick !!!

The other one is a typo but it won't make any difference. I'm fixing it.

Ahaha, a stack overflow :D

Well done Bruno! Though the process.nextTick can obviously be worked around, streamline does somehow in it's generated js. ;) I guess the point is that sometimes C compilers write better assembly than humans.

Hmm, I just noticed you're not using the same bench function. That's kinda cheating. It's the recursive nature of the first bench function that requires using process.nextTick there.

A bit of explanation:

  • Streamline in callback mode: The loop is transformed into a recursive form by the streamline compiler. So I'm not cheating here. But streamline does not use nextTick to avoid the stack overflow. It uses a trampoline instead, and that turns out to be faster. Early versions of streamline used nextTick but this was not consistent with the semantics I wanted to have because it broke the principle that all yielding points remain explicitly marked with an _. This is why I went with a trampoline.

  • Streamline in fibers mode: the loop is not transformed. The first call to the load function takes a performance hit because it creates a fiber, yields, etc. but the 999,999 subsequent calls execute as direct JS calls. This explains why the fibers mode wins by 10 to 1. But I would not call this cheating because the whole point about fibers is that it does not force you to transform the JS flow control constructs (loops, if/else, try/catch).

Great explanation Bruno, but then wouldn't it be more fair to use a trampoline in the bench function for the pure js version? Though the code will be more ugly and probably not what I would have written by hand.

(Btw, I had never heard this term. Now that I looked it up I have a word for a technique I used sometimes but didn't know how to explain)

Well, the claim that so many people make is that hand-written code can only be faster. So, I wanted to compare streamline with callback code that people would normally write. I am sure that if I had posted the callback version above and asked if this was the right way to write a caching function and a loop, nobody would have complained that it should have used a trampoline instead because it is faster.

Also, streamline cannot use the standard while (typeof fn === "function") fn = fn(); trampoline because it has to deal with non-streamline functions that don't trampoline. So it has to use a more complex construct, which is a bit tricky to understand. I really doubt that anyone would feel like writing this construct every time he writes a loop.

But of course, I could have taken the code generated by streamline, removed a bit of extra fluff (like the little frame object that I use to generate long stack traces) and I would have obtained a faster version. But that's not the way people hand-write their callback code.

Ok, it took me a while to figure out, but I found a way to not need the nextTick using a trampoline style trick:

var count = 1000000;

function bench(cb) {
  var total = 0;
  function loop(i) {
    var async;
    while (async !== true) {
      async = undefined;
      if (i === count) {
        cb(null, total);
      } else {
        load(__dirname + '/benchCallbacks.js', function(err, data) {
          if (err) return cb(err);
          total += data.length;
          if (async) {
            loop(i + 1);
          } else {
            async = false;
            i++;
          }
        });
      }
      if (async !== false) {
        async = true;
      }
    }
  }
  loop(0);
}

With this I get speeds close to the streamline transform version.

While the speed differences in all these rarely matters in any real world application, the semantic difference between the trampoline, nextTick, and fibers is real and does matter.

FWIW, here are the speed numbers I'm getting on my machine with my new trampoline style version as benchCallbacks2.js.

tim ~/gist-2362015 $ uname -a
Linux touchsmart 3.2.13-1-ARCH #1 SMP PREEMPT Sat Mar 24 09:10:39 CET 2012 x86_64 Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz GenuineIntel GNU/Linux
tim ~/gist-2362015 $ node benchCallbacks.js 
hit=999999, missed=1, result=805000000
elapsed: 1297
tim ~/gist-2362015 $ _node benchStreamline
hit=999999, missed=1, result=805000000
elapsed: 1001
tim ~/gist-2362015 $ _node --fibers benchStreamline
hit=999999, missed=1, result=805000000
elapsed: 180
tim ~/gist-2362015 $ node benchCallbacks2.js 
hit=999999, missed=1, result=805000000
elapsed: 308

Would have been a bit faster to cut and paste the streamline code in the online demo:

http://sage.github.com/streamlinejs/examples/streamlineMe/streamlineMe.html

My pattern has two loops. There is another one inside the callback. I think that this was necessary because I sometimes end up trampolining from the callback but this may also be something not very clever about my code generation (which is a bit contrained by other factors).

When was the last time you had to load the same thing a million times in a row in a tight loop?

I can't figure out what program this benchmark is supposed to represent.

What Bruno is doing is important because one should not write callback-style code in every case. Business logic for example or you are setting yourself up to a maintenance nightmare. Library code is fine because that's the hard core Node guys are accustomed to. :-)

I work currently at a (non-Node) project where the son of the boss knows a little programming. This is very helpful for the project because this makes communication a lot easier: he functions as a relay between the company and software development. But I cannot imagine that he could write or understand business logic in callback-style. I definitively would use Streamline in such a case!

The million times in a row is just one of the corner cases: Javascript does not do tail calls. creationix already mentioned that he used a trampoline, that means, the Node hard core guys already know a lot about these corner cases.

I have a plea to the Node community: Please pay more respect to Bruno! I think what he does is saving Node in the long term from fading away into a niche for really clever programmers.

As a side note, could it be that the fiber version runs faster because async calls are so much more expensive than sync ones?

See http://jsperf.com/asynch-cost

Surprised? well... you now understand better my "node anxiety" issue -- http://news.ycombinator.com/item?id=2371152

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.