weavejester/gist:1982807

## gistfile1.md

      
    Raw
  

              gistfile1.md
            
          
    Initial thoughts on Datomic

Rich Hickey (of Clojure fame) has released a cloud-based database
called Datomic that has some interesting properties.
Datomic is an log of assertions and retractions of "facts", much as
a DVCS like Git is a log of code diffs. The state of the database
at any one time is the sum of all the assertions and retractions up to
that date.
Unlike Git, you cannot remove an assertion or retraction from
the database. The log of changes is persistent, and always accessible.
Because historical data cannot currently be removed from Datomic,
using it in juristictions with strong privacy laws (like the EU) might
be problematic. The only way around this at present is to instruct an
attribute to throw away all history, via the :db/noHistory schema
option.
Each assertion and retraction is bound to a transaction, and
transactions are applied to the database via a transactor, a server
that ensures transactions are atomic and do not conflict. This means
you don't get the consistency problems of many NoSQL databases, but it
does mean all writes must pass through a single machine.
If writes are somewhat bottlenecked, reads are anything but. Data
is stored in the cloud in Amazon's DynamoDB, but is also
aggressively cached by Datomic clients, known as Peers. This has the
interesting result that Peers can run queries against Datomic entirely
within their local memory.
So Datomic has atomic writes (with all their associated advantages and
disadvantages), and what looks like super-fast cached reads.
Datomic's query language is Datalog, which should be familiar to
many Clojure users, as it is also used in Clojure's core.logic
library, and for writing Hadoop queries via Cascalog. I
won't say more more about Datalog, as there's a wealth of information
on it online.
The queries themselves are constructed as simple data structures,
which makes much more sense than parsing a string, as in SQL
databases. It is also the approach taken by several NoSQL databases,
such as MongoDB.
Datomic is not schemaless, which sets it apart from many NoSQL
databases, but nor does it group entities into fixed tables, as in
SQL. An entity consists of one or more attributes, and each
attribute must be defined in the database schema. In this sense,
schema attributes in Datomic have more in common with type definitions
in a statically typed programming language.
Like many databases, Datomic also supports indexes and uniqueness
constraints via the schema. Since these indexes will be coming from
either Dynamo DB, or in-memory, querying data in Datomic seems like it
should be very quick indeed.
Finally Datomic has partitions, which are a way of grouping entities
in the database. These differ from tables in SQL or collections in
MongoDB in that Datomic partitions appear to only affect performance.
Queries act on all partitions, but work faster across entities stored
in the same partition.
So do I like it?
Actually, I really do. I suspect the use of a single transactor server
to guarantee atomicity is going to take a lot of flak, but it seems
like a reasonable compromise to me. Because all its doing is managing
transactions, rather than persisting data, the transactor should be
more performant than a single-server database, and because it doesn't
store any data, it'll matter less if it goes down. It's probably the
optimum solution for maintaining atomicity.
I like the idea of a persistent transaction log (even though we'll
definitely need a way to 'forget' data in future), and being able to
retrieve snapshots of the database at any point in time. Querying
using datalog seems extremely powerful, and like the relational model,
is based on first-order logic; a solid theoretical base. I also really
like the idea of running queries against an in-memory cache.