Skip to content

Instantly share code, notes, and snippets.

@abp
Forked from weavejester/gist:1982807
Created March 7, 2012 10:22
Show Gist options
  • Save abp/1992402 to your computer and use it in GitHub Desktop.
Save abp/1992402 to your computer and use it in GitHub Desktop.
Initial thoughts on Datomic

Initial thoughts on Datomic

Rich Hickey (of Clojure fame) has released a cloud-based database called Datomic that has some interesting properties.

Datomic is an log of assertions and retractions of "facts", much as a DVCS like Git is a log of code diffs. The state of the database at any one time is the sum of all the assertions and retractions up to that date.

Unlike Git, you cannot remove an assertion or retraction from the database. The log of changes is persistent, and always accessible. Because historical data cannot currently be removed from Datomic, using it in juristictions with strong privacy laws (like the EU) might be problematic. The only way around this at present is to instruct an attribute to throw away all history, via the :db/noHistory schema option.

Each assertion and retraction is bound to a transaction, and transactions are applied to the database via a transactor, a server that ensures transactions are atomic and do not conflict. This means you don't get the consistency problems of many NoSQL databases, but it does mean all writes must pass through a single machine.

If writes are somewhat bottlenecked, reads are anything but. Data is stored in the cloud in Amazon's DynamoDB, but is also aggressively cached by Datomic clients, known as Peers. This has the interesting result that Peers can run queries against Datomic entirely within their local memory.

So Datomic has atomic writes (with all their associated advantages and disadvantages), and what looks like super-fast cached reads.

Datomic's query language is Datalog, which should be familiar to many Clojure users, as it is also used in Clojure's core.logic library, and for writing Hadoop queries via Cascalog. I won't say more more about Datalog, as there's a wealth of information on it online.

The queries themselves are constructed as simple data structures, which makes much more sense than parsing a string, as in SQL databases. It is also the approach taken by several NoSQL databases, such as MongoDB.

Datomic is not schemaless, which sets it apart from many NoSQL databases, but nor does it group entities into fixed tables, as in SQL. An entity consists of one or more attributes, and each attribute must be defined in the database schema. In this sense, schema attributes in Datomic have more in common with type definitions in a statically typed programming language.

Like many databases, Datomic also supports indexes and uniqueness constraints via the schema. Since these indexes will be coming from either Dynamo DB, or in-memory, querying data in Datomic seems like it should be very quick indeed.

Finally Datomic has partitions, which are a way of grouping entities in the database. These differ from tables in SQL or collections in MongoDB in that Datomic partitions appear to only affect performance. Queries act on all partitions, but work faster across entities stored in the same partition.

So do I like it?

Actually, I really do. I suspect the use of a single transactor server to guarantee atomicity is going to take a lot of flak, but it seems like a reasonable compromise to me. Because all its doing is managing transactions, rather than persisting data, the transactor should be more performant than a single-server database, and because it doesn't store any data, it'll matter less if it goes down. It's probably the optimum solution for maintaining atomicity.

I like the idea of a persistent transaction log (even though we'll definitely need a way to 'forget' data in future), and being able to retrieve snapshots of the database at any point in time. Querying using datalog seems extremely powerful, and like the relational model, is based on first-order logic; a solid theoretical base. I also really like the idea of running queries against an in-memory cache.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment