public
Last active

Reactor-based framework versus Node.js streaming

  • Download Gist
gistfile1.md
Markdown

I've been hacking away recently at a JVM framework for doing asynchronous, non-blocking applications using a variation of the venerable Reactor pattern. The core of the framework is currently in Java. I started with Scala then went with Java and am now considering Scala again for the core. What can I say: I'm a grass-is-greener waffler! :) But it understands how to invoke Groovy Closures, Scala anonymous functions, and Clojure functions, so you can use the framework directly without needing wrappers.

I've been continually micro-benchmarking this framework because I feel that the JVM is a better foundation on which to build highly-concurrent, highly-scalable, C100K applications than V8 or Ruby. The problem has been, so far, no good tools exist for JVM developers to leverage the excellent performance and manageability of the JVM. This yet-to-be-publicly-released framework is an effort to give Java, Groovy, Scala, [X JVM language] developers access to an easy-to-use programming model that removes the necessity to use synchronization and worry about concurrency issues, while making it easy to respond to events in multiple threads if it's more efficient for your application to do so (unlike the strict single-threadedness of Node.js, this framework gives you a choice of single-threaded efficiency or multi-threaded parallelism).

The benchmark below is of this reactor-based framework that uses the old-school Java NIO FileChannel.transferTo (sendfile) method to stream data from the filesystem to the client. The Node.js application uses streaming to pipe a file directly to the client.

The Groovy code looks like this:

def server = new HttpServer(3000)
  .on("/lib/{resource}**", {HttpMessage request ->
    def file = request.pathParam("resource")
    def path = Paths.get(resources, file)
    if (Files.exists(path)) {
      request.respond(StandardHttpResponses.ok(contentType, path))
    } else {
      request.respond(StandardHttpResponses.notFound(request.uri().path))
    }
  })
  .start()

A similar application could be built with pure Java using annotations. The POJO delegate would look something like:

@On("/lib/{resource}**") @Get
public void static(HttpMessage request) {
  String file = request.pathParam("resource");
  Path path = Paths.get(resources, file);
  if (Files.exists(path)) {
    request.respond(StandardHttpResponses.ok(contentType, path));
  } else {
    request.respond(StandardHttpResponses.notFound(request.uri().path));
  }
}

The Node.js code looks like this:

http.createServer(
  function (req, res) {
    var pth = path.join("static", req.url)
    var rs = fs.createReadStream(pth);
    rs.on("error", function() {
      res.writeHead(404);
      res.end();
    });
    rs.once("fd", function() {
      res.writeHead(200, {'Content-Type': 'application/octet-stream'});
    });
    rs.pipe(res);
  }
).listen(3001, "127.0.0.1");

When I ran Apache Bench against these servers (100 concurrent users downloading a 1MB file), the JVM framework handily out-performed the Node.js version. By handily, I mean it was more than twice as fast and supported twice the throughput:

Node.js:

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        1    3   1.0      3       4
Processing:   246  252   2.3    252     254
Waiting:       21   32   6.3     34      41
Total:        248  254   2.4    255     257

Percentage of the requests served within a certain time (ms)
  50%    255
  66%    256
  75%    256
  80%    256
  90%    257
  95%    257
  98%    257
  99%    257
 100%    257 (longest request)

JVM framework:

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        1    3   0.9      3       4
Processing:    64  118  17.1    123     135
Waiting:        4   16   9.2     13      43
Total:         67  121  17.2    126     138

Percentage of the requests served within a certain time (ms)
  50%    126
  66%    132
  75%    135
  80%    135
  90%    136
  95%    138
  98%    138
  99%    138
 100%    138 (longest request)

From the gallery: "No fair! You're using multiple threads!"

Since setting up 4 Node.js processes and configuring the load balancing was more than I wanted to take on just for a simple microbenchmark, I dropped the concurrent users to 1 and re-ran the tests:

Node.js:

Requests per second:    270.12 [#/sec] (mean)
Time per request:       3.702 [ms] (mean)
Time per request:       3.702 [ms] (mean, across all concurrent requests)
Transfer rate:          276612.41 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       4
Processing:     3    4   2.3      3      32
Waiting:        0    1   1.0      0      25
Total:          3    4   2.3      3      32
WARNING: The median and mean for the waiting time are not within a normal deviation
        These results are probably not that reliable.

Percentage of the requests served within a certain time (ms)
  50%      3
  66%      3
  75%      4
  80%      4
  90%      4
  95%      4
  98%      6
  99%     15
 100%     32 (longest request)

JVM framework:

Requests per second:    228.89 [#/sec] (mean)
Time per request:       4.369 [ms] (mean)
Time per request:       4.369 [ms] (mean, across all concurrent requests)
Transfer rate:          234428.13 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     2    4   0.5      4       8
Waiting:        0    0   0.1      0       3
Total:          2    4   0.5      4       8

Percentage of the requests served within a certain time (ms)
  50%      4
  66%      4
  75%      5
  80%      5
  90%      5
  95%      5
  98%      5
  99%      6
 100%      8 (longest request)

With only a single thread at a time, the JVM competed slightly better in response time (drastically smaller standard deviation with identical mean times) while Node.js supported about 6% greater bandwidth.

JVM haters will never be convinced

The point of this is not to bash Node.js. It's a great platform for some applications that can benefit from the things it does well. It's also fine for dogmatic JVM-haters to dismiss any of these tests as flawed or irrelevant. They'll never be open-minded enough to take an honest look at what the JVM can do for this new class of applications that need C100K capabilities but would benefit from the decades of engineering that's gone into the JVM and the plethora of management tools and operational experience with the platform.

Combining the JVM with a better framework for writing non-blocking, evented applications is the goal. These tests just confirm for me that that goal is achievable and that there is benefit to be had from such a framework. It also tells me that the weaknesses of the JVM for C100K applications have more to do with the programming model than they do with the JVM itself. If I were to tune the JVM running these tests, rather than use the default settings, I could likely get even better numbers than these. The JVM has a boatload of knobs to turn that I simply didn't take the time to tweak.

it would be very helpful if you also included node.js version.

This is interesting, indeed. What is the class fully-qualified name of the HttpServer class you use in Groovy implementation? I like the DSL used there to specifiy request handlers!

From the gallery: "No fair! You're using multiple threads!"

Since setting up 4 Node.js processes and configuring the load balancing was more than I wanted to take on just for a simple microbenchmark, I  
dropped the concurrent users to 1 and re-ran the tests:

Depending on the version of Node.js you're using, you could just use Node's cluster to help you out with this.

Interesting. Deft Server does something similar and has some favourable benchmarks as well.

BTW, your annotations look very similar to those for JAX-RS / Jersey. You could swap to support those.

As regards, concurrency, I think the JVM struggles to handle it easily. You either need synchronization, copy-on-write support or fast serialization/deserialization. And you also need a SW designer who understands each perfectly and when to use it, and software that will never change from needing one type to another. Good luck with that!

Does your java stuff do some kind of in-RAM caching?

Great post... you made one error though.

Your JVM code blocks in the event loop. It will have to stat() the file which will block.

It may be that the underlying OS has cached the innode responsible for for this in your test but on a production system the VFS might evict the innode and your performance will dive.

Also , ALL the first requests are blocked until it can be cached.

Also, I think BOTH tests are unfair.

Couldn't you just drop the number of threads Java is using to 1 ? Then re-run it with concurrent users? This should mean that java is running with one core just like Node.js

Why should he have to limit one language to make up for the failings of another? While he certainly shouldn't purposefully optimize one, it is only fair to let it do whatever it wants on its own

@whitewater ... You should benchmark against how it normally would be written on the optimal form on that language. Otherwise why not just handicap the results until you get what you want from the benchmark.

Blocking in the io thread is not a good idea and in Java the right way to do this would be to use an async stay by using an executor.

@burtonator I wasn't trying to say that it shouldn't be optimized, I guess I miswrote what I was trying to say... More of a "don't specially optimize for serving one megabyte files to one hundred clients". I completely agree with your argument, though.

Thanks for all the discussion and suggestions! :) I'm not so much interested in a horserace between Node.js and the JVM. My entire effort with this framework is to give Java and JVM developers a way to write non-blocking, potentially asynchronous applications, without worrying about concurrency issues and synchronization. Those things don't apply for the vast majority of this framework because the code is, as far is is possible, executed by the same thread that started the whole task.

A secondary (or even tertiary) concern is how this JVM framework compares to other non-blocking frameworks. Node.js is probably the most popular and most-bandied about at the moment, so it makes sense to see how the JVM compares to V8. The point of these tests is that it compares quite favorably and that the handicap is not the technical limitations of the foundation, but the application frameworks and programming models that are available to take advantage of all this performance and engineering.

This framework is (hopefully) soon to be publicly admitted to. We'll decide on a name (likely some variation of "Something Reactor" to reflect its origins as an implementation of the Reactor pattern for the JVM). Multiple annotation models will be supported, including things like JAX-RS, Spring MVC, these custom annotations, etc... The decisions used to map methods to events is pluggable and different implementations can do things differently. Currently, the scheme I demonstrated is the only one I've had time to implement. :) It also supports Groovy, Scala, and Clojure natively (i.e. no wrappers required, simply assign a Closure or whatever as an event handler and the framework knows how to invoke that).

@KitD You actually don't need any of that stuff because almost all of your asynchronous work can actually be scheduled to be run on the originating thread. You just need a competent framework to do that sort of thing. I'm using HawtDispatch at the moment, but we're talking about maybe using a Disruptor RingBuffer or some other similar abstraction.

I'm at the Basho offices in San Francisco at the moment and we were just having a discussion about this very topic not 15 minutes ago. :)

If you are looking to "give Java and JVM developers a way to write non-blocking, potentially asynchronous applications, without worrying about concurrency issues and synchronization" I'm surprised that you have not mentioned Actor libraries, especially since your Scala experience means that you would have come across Akka. Akka would look miserably complicated for this simple example but for a more complex examples it does a great job of both hiding the complexity of concurrency while giving the best of both event and threaded worlds.

The library I'm writing actually is, in a sense, an Actor library. But a purely Actor model isn't, IMO, flexible enough. You still have two different "kinds" of programming models when writing an application that uses Actors. By that, I mean, there is the very imperative use of an actor (actorRef ! msg) and when you're doing asynchronous or messaging, there is a different model (subscriber.publish(...)). My goal is to combine both and make it easy for a programmer to respond to external events (coming from a message broker, maybe) and internal events (coming from another thread) using a common API since the two tasks do not, by necessity, have to be done differently (reactor.on("event", handler) and reactor.emit("event", payload) covers both situations and is self-documenting).

In a Scala version of this code, I actually defined a bang method (!). It didsn't offer anything more (syntactically) than reactor.emit("event", payload), though. It's more characters to type, of course. :) But Java can't define such a construct, so it doesn't matter anyway.

@burtonator This code is not based on an event loop. It is multi-threaded but tasks are ordered so there isn't any concurrent access (unless you intentionally do that). I've found that it's more performant to do certain blocking operations, particularly when it comes to file IO. It's not always easy to determine where the line is between "I need to do this in another thread" and "it's okay to do that operation in this thread". I certainly wouldn't try and issue an HTTP request like this. But using the traditional, blocking FileChannel from another thread (which is what's happening under the covers), is orders of magnitude faster than using the JDK 7 AsynchronousFileChannel. The same holds true for sockets. The traditional synchronous sockets are more performant than the asynchronous ones.

The gist of this (no pun intended) is that arbitrarily enforcing a "never, ever block on anything" policy is certainly a more "pure" approach, but it is not as performant as the mixed, blocking (but in another thread) and non-blocking (which is also in another thread, you just don't see it directly :) pragmatic approach I've taken with this framework.

At some point, it's impossible to be purely non-blocking as there's only so many threads (one per processor in my case) and a thread actually performs work and context switching is relatively expensive in some operations so my benchmarks have shown me that proper use of blocking IO can give you significant performance gains while not introducing concurrency artifacts.

Looks great -- would love to take it for a spin if you release the source.

Jon,

how does your "new framework" differ from http://purplefox.github.com/vert.x/ ?

----- Original Message -----

Jon,

how does your "new framework" differ from
http://purplefox.github.com/vert.x/ ?

To be honest, the farther along I get (which isn't saying much since I put up that gist), the less I'm interested in the polyglot aspect. When I decided to go the annotations and flexible handler route with Java, it makes it super easy to write evented applications. It turns out injection and the Spring container are a perfect fit for wiring events and there are already ways to do things in Spring that make working with Java a lot easier than the traditional anonymous class style of callback programming that's been the norm. Using this flexible POJO approach makes it almost as easy to write in Java as it is in Groovy. Since I do all my testing in Groovy, I'm pretty much focused primarily on how well it works in Java and Groovy.

There are certainly some philosophical differences between vert.x and what I'm working on. I'm trying to keep it a framework and not a runtime so it embeds into any Spring application easily. It's actually built on Spring core so it leverages things like the ConversionService for converting handler parameters, etc... I'm "borrowing" the methodology Rossen is using in the latest Spring MVC code to invoke arbitrary POJO handlers using a pluggable strategy (so one could support a JAX-RS handler, for instance, by configuring a component that wires the POJO which responds to JAX-RS annotations). It does use a Reactor style of evented programming which makes it easy to get ordered execution of tasks (it also leverages the TaskExecutor abstractions for invoking handlers) but it also doesn't prevent you from using shared state. It has an assumption that parts of the code will need ordered execution (a term I use that means close to the same thing as being single-threaded but has different implications) but that parts of the code need to share resources in order to minimize memory usage and maximize reusability.

It is primarily an event-driven application framework. It's not really a TCP server that has other uses, if you see what I mean. The TCP portion is raw NIO code that I've developed by looking into the various frameworks (Netty, of course, but MINA and HawtDispatch as well) and trying to distill the best of all of them. It gets extremely high performance that way, as evidenced by the tests, but it makes other things more difficult (like supporting SSL and websockets, which Netty can do out of the box). As with everything, there are trade-offs. For the HTTP portion, I'm assuming the standard path of application development will first go through a buffer-and-convert handler which will leverage the Spring 3 HttpMessageConverters. In some cases, I'll want to stream the content (like if I send big blobs/attachments, etc...) but in general those are the minority of cases. By and large, I'll be sending and receiving small bits of content that will fit into memory and can be passed to my event handler as a mapped POJO. Many of these configuration decisions can be inferred from the handler that's passed to be invoked for that event.

Another thing I'm keenly interested in and wanting to build in at the core is a Dynamo implementation. I think having riak_core for the JVM would be really useful. I've played with the Ring and Node abstractions and what an AMQP-backed SPI would look like for doing inter-node communication. Redis and AMQP seem to be logical first components to use to back a Ring abstraction. Truth be told, the Redis-backed one is far easier to write. :) Either way, the engineering is already done in the shape of riak_core. It is a killer implementation of Dynamo and using it as the inspiration for a JVM version seemed to be a no-brainer.

Thanks!

Jon Brisbin
SpringSource (a division of VMware)

Twitter: @j_brisbin


Reply to this email directly or view it on GitHub:
https://gist.github.com/1444077

i have based a workable GWT middle-tier (https://github.com/jnorthrup/RelaxFactory ) on a 5-class java NIO api supporting this abstraction of a closure https://github.com/jnorthrup/1xio/blob/master/src/main/java/one/xio/AsioVisitor.java

i have a couch backend fully asyncronous, and have reviewed vert.x and like the interfaces, have considered wrapping them as well.

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.