Skip to content

Instantly share code, notes, and snippets.

@mikeal
Last active December 19, 2015 10:09
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mikeal/5938298 to your computer and use it in GitHub Desktop.
Save mikeal/5938298 to your computer and use it in GitHub Desktop.
Linear sequence revision history.

Apache CouchDB's data model stores a "revision tree". This tree branches on two events:

  • the same revision is updated on two nodes and then those node's replicate, creating a conflict.
  • a document is deleted by writing a new revision with _deleted, then the document is created again which creates a new tree begining with a revision starting in 1-.

Because CouchDB also implements a default "most writes wins" policy as a default winner during conflicts and replication this means that during replication the client must read the entire rev tree up to the point that the last matching revision exists.

In practice, very few CouchDB applications actually resolve conflicts using the revision. Instead, the vast majority of applications just stick with the "most writes wins" policy which works well enough for most workloads.

So, for couchup, I've decided to rely on the "most writes wins" policy and not write or hold on to the entire revision tree. Since I'm not storing the entire rev tree I do need an alternative policy for writing revisions and trying to remain mostly compatible with CouchDB during replication.

What I'm proposing instead is that all writes to a document, including deletions and re-creations, increment the revision sequence and that revisions are written as a sparse sequence similar to _changes.

This preserves the "most edits wins" semantics even while continuously stemming the rev tree. It reduces the number of reads necessary for replication and the amount of data written during every update.

One bit of difficulty is in replicating with existing CouchDB that delete and then re-create a revision. This can probably be solved by.

  • pulling changes in style=all_docs
  • when the first revision in the changes array has a lower sequence pull the list of revs check if the rev i have is in the most recent rev branch on the remote. if it is not, sum up all the sequences of the branch heads and use that sequence when writing to couchup.

Push replication is as simple as a new_edits=false PUT of the current doc but push/pull from CouchDB will result in improper revisions winning.

Replication between couchup nodes is incredibly simple as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment