Skip to content

Instantly share code, notes, and snippets.

@tav
Created June 4, 2010 12:24
Show Gist options
  • Save tav/425349 to your computer and use it in GitHub Desktop.
Save tav/425349 to your computer and use it in GitHub Desktop.

Some of you might wonder why I didn't just use one of the existing NoSQL datastores, so I've elaborated below on why they don't suit my needs. This is not to say that they won't be highly suited in other contexts -- especially since they're all quite impressive in their own ways.

Cassandra is promising, but:

  • It has no native transactions or secondary indexes.
  • It currently requires a cluster-wide restart/migration for schema updates.
  • It is geared towards eventual consistency and configuring it to be strongly consistent (as opposed to weak), is a sure way to see a serious performance drop.
  • I find its API non-intuitive -- columns, super columns, etc.
  • It's Java -- after having lost the bulk of a quarter million dollars on a Java-based project 10 years ago, I have this unfortunate aversion to everything Java.

CouchDB is very cool and has a super sexy API, but:

  • It's very much a document-oriented datastore and is not too suited for data warehousing purposes.
  • You have to manually create view indexes and creating temporary views is quite a slow process with large datasets.

HBase:

  • Will eat your system resources.
  • Offers very little beyond a BigTable clone and for that benefit you get to deal with half a dozen moving parts.

Hypertable:

  • Lost all my data the last time I played with it.
  • Offers little beyond a BigTable clone -- HQL comes nowhere near what the App Engine datastore offers.

MemcacheDB:

  • Offers very little beyond a key-value store.

MongoDB is like CouchDB's less mature, bigger brother. It's very impressive on a speed front, but achieves that by skipping on things that I want:

  • Atomic operations are limited to single documents only.
  • Auto-sharding is still in development and the replication mechanism doesn't make any guarantees about the consistency of your writes.
  • And, oh, repairing a large MongoDB using db.repairDatabase() can take a painfully long time -- during which time it's also blocked!

Neo4j is a nice embedded graph database, but it's not one for horizontal scaling yet, e.g.

  • No auto-sharding -- although it's being worked on as part of someone's master's thesis.

Riak is also nice, but:

  • It's geared towards eventual consistency.
  • It's mainly just a key-value store and its mapreduce functionality doesn't provide the kind of querying support that I want.

Scalaris is perhaps the most technically impressive of the lot, but:

  • It's a pain to configure and deploy.
  • It doesn't provide persistent storage.
  • It doesn't offer much beyond a key-value store.

Tokyo Cabinet/Tyrant is fun, but:

  • It has no auto-sharding.
  • It's just a key-value store.

TyphoonAE Redis Datastore even supports the App Engine datastore API, but:

  • It can scale up to only a single server.

VoltDB:

  • I left SQL in 2001 and I'm never going back, thanks.

Voldemort is a nicely designed key-value store, but:

  • It's not much more than a key-value store -- you have to create and manage your own indexes for querying.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment