Last weekend I spent a few hours implementing the CouchDB document storage model on top of levelup in node.js.
Today I had a few spare hours in my Sunday and decided to implement the HTTP interface and compare the performance with Apach CouchDB. The benchmarks can be found here https://github.com/mikeal/couchup/tree/master/tests/benchmark
node compare-new-writes.js Apache CouchDB 67140 writes couchup 80826 writes
This test runs for one minute and writes many tiny new documents. The test uses 10 concurrent http clients (maxSockets of 10 and a loop size of 11 to make sure pool is always saturated).
On both CouchDB and couchup these writes will incur a read in order to check if the document exists and what rev it's at. While couchup is a little faster I would expect this gap to be closed once levelup can insure ordering (more on this below).
node compare-new-writes.js Apache CouchDB 18108 writes couchup 15006 writes
This was the edit performance I had Sunday afternoon. couchup has an in-memory mutex for editing a document (a requirement since we don't have an append-only file we're writing to) and I had already included an LRU cache for the revs. In this test every rev check should be pulled out of the LRU so this was surprisingly bad performance for couchup.
I identified a few bottlenecks. The biggest one was that the array of revisions being stored as meta information was growing larger with every write. When I thought of ways to remove the need for it I realized that I could elimate all the cached metadata writes and go from 4 writes and a delete on every edit to just 2 writes. This is easily accomplished by including the sequence in the complex key and using peek for all the reads (metadata and documents).
Most importantly this new approach avoids overwrites and deletes and only writes new keys.
This slows down only two operations that I can think of:
- GET /db/doc?rev=blah because the complex key for documents sorts the sequence before the rev and we'll have to do a range over all the sequences in the database in order to find the specific revision. This actually doesn't happen much, very rarely do people request documents at a specific rev, this is usually used more internally but we've obviated the need for it in couchup.
- _changes feed will include sequences for revs that have been removed until compaction and if we don't want to return them in the feed we'll have to do a reverse filter over the set. If the _changes case ends up being an issue we could just eat a delete on write since we have the previous sequence. I'm not convinced it's necessary yet but it's good to know it won't be hard.
But, the impact these new changes had on write performance was incredibly positive, the biggest impact we see is in the edit performance. CouchDB is considerably slower editing than new writes but couchup is actually faster because of the LRU.
Apache CouchDB 17631 writes couchup 106331 writes
couchup is by no means a complete implementation of CouchDB. I believe I have the entire document storage pipeline, so I think a comparison here is fair but there are certainly bugs and at the moment you don't even get a 404 when a document is not found, it's a 500 :) But, the point of doing a comparison this early was to tackle some of the skepticism about level's ability to perform the higher level storage functions required by CouchDB's data model compared to Apache CouchDB's custom append-only btree implementation which I have addressed with greater success than I anticipated.
Another thing to consider is the fact that operations sent to leveldb aren't insured to return in the order they are sent. Until there is a way to insure this there just isn't a way to fully guarantee that at a particularly massive scale of writes that blow out the LRU cache you can't write to a document with a slightly out of date rev. I belive @rvagg and @dominictarr are working on a way to do this and once they do I expect a small performance hit to the numbers I've demostrated above but we should be as safe and consistent at that point as Apache CouchDB.
This project is a week old, it certainly isn't as stable or safe as CouchDB and there are likely many other issues beyond the ones I've pointed out that will need to be fixed before I or anyone else should consider switching from Apache CouchDB but this early work does prove that the project is more promising than some skeptics have made it out to be and the level of investment happening in leveldb combined with the momentum of the new NodeBase movement contrasted with the declining momentum of the Apache CouchDB project are reason enough for myself and hopefully others to invest in stabilizing couchup.
It should also be noted that during both tests my laptop got much hotter running the couchup test than the Apache CouchDB test.
Rather than just pool all the writes until
nextTick() I buffer all outgoing writes while waiting on a single batch. When it returns all the buffered writes are written. This showed only minor changes to peformance but my laptop stopped heating up so much when running my tests.
After some discussions with level people, @rvagg in particular, I decided the best thing to do would be to cycle through the reads the same way I had pooled and buffered the writes. If we run all reads, then all writes, then all reads, and on and on, then we can make a lot more consistency guarantees at a higher level. Batching all the writes is actually great for performance but for reads we just run them all at onces and buffer the responses so that we can return the response callbacks in order (otherwise we woudn't be insuring that the first attempted writer with the right rev would be the one who gets to write to it).
I implemented this all as an abstract mutex for level called level-mutex and then I rewrote most of
couchup to use it. Now, to my knowledge, I don't have the previous issue I sighted where under load we might have consistency problems. In theory, we have the same consistency guarantees as CouchDB for the revision update semantics and 10x the performance when updating hot keys because of the LRU. Here's the latest numbers
$ node compare-edit-writes.js Apache CouchDB 18402 writes couchup 112481 writes
% node compare-new-writes.js Apache CouchDB 68053 writes couchup 77331 writes
The next thing I'm doing is implementing a replicator. I've decided to diverge from CouchDB on some of the rev meta storage semantics, basically I'll be stemming during replication and compaction. I wrote up the reasoning in the
Also, after having a quick discussion with @dominictarr I'll be pulling in a bloom filter for new writes. Hopefully this will result in the same kind of performance gains the LRU showed.
Implemented the bloom filter which improved the new write performance noticably but not as dramatically as the rev lru did for updates.
% node tests/benchmark/compare-new-writes.js Apache CouchDB 61553 writes couchup 98509 writes