tav/why-not-existing-nosql.rst

## why-not-existing-nosql.rst

      
    Raw
  

              why-not-existing-nosql.rst
            
          
    Some of you might wonder why I didn't just use one of the existing NoSQL
datastores, so I've elaborated below on
why they don't suit my needs. This is not to say that they won't be highly
suited in other contexts -- especially since they're all quite impressive in
their own ways.

Cassandra is promising, but:

It has no native transactions or secondary indexes.
It currently requires a cluster-wide restart/migration for schema updates.
It is geared towards eventual consistency and configuring it to be strongly
consistent (as opposed to weak), is a sure way to see a serious performance
drop.
I find its API non-intuitive -- columns, super columns, etc.
It's Java -- after having lost the bulk of a quarter million dollars on a
Java-based project 10 years ago, I have this unfortunate aversion to
everything Java.

CouchDB is very cool and has a super sexy API,
but:

It's very much a document-oriented datastore and is not too suited for data
warehousing purposes.
You have to manually create view indexes and creating temporary views is
quite a slow process with large datasets.

HBase:

Will eat your system resources.
Offers very little beyond a BigTable clone and for that benefit
you get to deal with half a dozen moving parts.

Hypertable:

Lost all my data the last time I played with it.
Offers little beyond a BigTable clone -- HQL
comes nowhere near what the App Engine datastore offers.

MemcacheDB:

Offers very little beyond a key-value store.

MongoDB is like CouchDB's less mature, bigger
brother. It's very impressive on a speed front, but achieves that by skipping
on things that I want:

Atomic operations are limited to single documents only.
Auto-sharding is still in development and the replication mechanism doesn't
make any guarantees about the consistency of your writes.
And, oh, repairing a large MongoDB using db.repairDatabase() can take a
painfully long time -- during which time it's also blocked!

Neo4j is a nice embedded graph database, but it's not
one for horizontal scaling yet, e.g.

No auto-sharding -- although it's being worked on
as part of someone's master's thesis.

Riak is also nice, but:

It's geared towards eventual consistency.
It's mainly just a key-value store and its mapreduce functionality doesn't
provide the kind of querying support that I want.

Scalaris is perhaps the most
technically impressive of the lot, but:

It's a pain to configure and deploy.
It doesn't provide persistent storage.
It doesn't offer much beyond a key-value store.

Tokyo Cabinet/Tyrant is fun, but:

It has no auto-sharding.
It's just a key-value store.

TyphoonAE Redis Datastore even supports the
App Engine datastore API, but:

It can scale up to only a single server.

VoltDB:

I left SQL in 2001 and I'm never going back, thanks.

Voldemort is a nicely designed key-value
store, but:

It's not much more than a key-value store -- you have to create and manage
your own indexes for querying.