Skip to content

Instantly share code, notes, and snippets.

@obengwilliam
Last active August 29, 2015 14:20
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save obengwilliam/6d1d8609be4a16684b57 to your computer and use it in GitHub Desktop.
Save obengwilliam/6d1d8609be4a16684b57 to your computer and use it in GitHub Desktop.

If you haven't read Netflix's Node.js in Flames blog post you should. It is a great deep dive into debugging a node performance problem. The post includes useful tips that can help you solve similar problems.

That said...

My feedback from the perspective of a framework developer is quite different. I found the tone and attitude towards express.js to be concerning and somewhat offensive. Here was a developer blaming the framework he chose for poor architecture when they never bothered to actually learn how the framework works in the first place. Everything that followed originated from this basic lack of understanding.

Express uses a very simple router logic which is at the core of how express works, so let’s examine that first (my knowledge of express is somewhat dated but I think the principle is still the same). Express keeps a hash of the various HTTP methods (GET, POST, etc.) and for each method, an array of middlewares. Each middleware is just a function with signature function(req, res, next) and an regular expression used to decide when to execute the middleware based on the incoming request path.

The router logic is pretty simple. When a request comes in, the routing table it looked up using the HTTP method which returns an array or middlewares. The array is then iterated over by matching the request path to the regular expression assigned to the middleware. When you add generic middleware to express (that is, not path-specific), those are added to the same array but without any path requirements. When a middleware is invoked, it can end the workflow by not calling the next() callback (or call it with an error). If next() is called, the next middleware match is executed. This is how all your app.use() middlewares are called before and after the route handler.

While this is not a design pattern I would choose, it is simple and elegant. It keeps the entire framework codebase minimal and consistent with its architecture. It is really all about middleware, some with filters (e.g. path handlers).

The post criticizes express for storing the routing table in an array vs a hash or tree. As it turns out, the complexity tradeoff between iterating over an array (which tends to be short given most application routing requirements) and walking a tree makes arrays a much better choice. I’ll also point out that Restify, the alternative framework Netflix listed as their new choice, does the same thing (though instead of recursive calls, it uses a for loop). The only router I know that uses a fancy tree is director and that design significantly handicapped it’s feature set and usefulness.

Matching routes to requests is tricky because developers like to use everything for defining their routes. This includes simple strings, regular expressions, wildcards, and path parameters. You can’t store these in a hash because you cannot look up a regular expression match based on a string. You have to iterate somewhere. A hash will only work if you limit routes to static string values (and if that’s the case, why use a router at all).

The criticism about allowing routes to repeat in the express array shows that even after doing all this work, the Netflix team still doesn’t understand how express works. I am only pointing this out because they made a public statement putting express down without acknowledging the reasons. The middleware architecture requires repeating the same path multiple times in the array because that’s how the matching works. It also allows powerful chaining of small actions on a route without having to collect them all in a single function (e.g. pre and post route processors).

The express design, and for that matter all the other framework I am familiar with except my own allows adding conflicting routes. This is not a bug but an outcome of their extremely flexible route matching support allowing regular expressions and wildcards. You cannot compare two regular expressions to decide if and how much they overlap, and in which order they should be compared. This is a limitation coming directly from the feature set. It is a simple tradeoff.

In hapi, we limit the types of routes you can add so that we can enforce strict limits on conflicts. We also worked hard to ensure that the order in which routes are added doesn’t matter. Routes are matched based on a deterministic algorithm that sorts the table on every addition. This was a very important feature for us working in a large team where people might not be aware of the routes added by others, or where the order in which plugins are loaded can cause unexpected production issues. These are all decisions we made based on months of hands-on experience building applications on top of express and director.

The Netflix post does take responsibility for failing to understand how the framework they chose works. But that admission does not excuse criticizing the framework as inept. The express architecture has worked well for many people. It has known limitations which is why there are so many other frameworks to choose from. One isn’t superior to the other without the context of your use case and requirements. There is no “best”.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment