Create a gist now

Instantly share code, notes, and snippets.

SLEEP - syncable.org

Your API does REST, but can it SLEEP?

SLEEP (Syncable Lightweight Event Emitting Persistence) is an emerging standard for distributed data sync using HTTP and JSON. A generalized version of CouchDB's much lauded built-in replication, SLEEP extends the REST architecture to define a way in which databases can offer syncable JSON APIs that foster open data innovation by allowing developers to replicate entire databases over the net.


SLEEP comes from the Apache CouchDB project which is now widely known for it's multi-master streaming HTTP + JSON replication. This is possible in part because of the CouchDB _changes feed, which is a particular API that lets you see if there have been any changes made to the database since last time you synchronized. CouchDB can efficiently implement the _changes feed because of one subtle difference between it and most other databases: it stores a history of all changes that happen to the database, including deletes.

If you synchronize data from a remote source and then the remote source deletes a bunch of data, how do you find out what to sync when you want to synchronize again? The simplest solution is to keep enough history stored in your database so that you can produce an append-only transaction log of all data operations that remote sync clients can use to check for new changes.

In essence this append only _changes feed is what SLEEP describes. All you have to do to make your API SLEEPy is a) store CRUD operations (especially deletes) and b) expose your changes feed as JSON over HTTP. There are a number of middleware plugins being developed for major MVC frameworks (Rails, Django and Drupal) to make SLEEP a plug and play enhancement.

@natevw
natevw commented Oct 19, 2011

In case the description above does not explain it well enough, there's a really helpful diagram at http://syncable.org/

@stephenjudkins

I agree 100% that this is a necessary enhancement to many RESTful APIs.

I'd argue that some sort of rationale would help people understand why this is necessary. Here's my shot at it:

Trying to keep a database synced with another across a traditional REST API means you have to pick at least one of the following three undesirable properties:

  • One database "lags" behind the other as a large, bandwidth-intensive, "full" sync is performed to guarantee both databases are kept up-to-date.
  • Limiting databases to impractically small sizes that can be easily synced within less than a second
  • Labor-intensive, bug-prone ad-hoc synchronization processes where some resources are lazily checked as being up-to-date as necessary. There's no guarantee that these types of processes offer correct or consistent results.

Also, I would not characterize CouchDB's property of storing complete history of all changes as a "subtle" difference from other databases, however. I'd call it "fundamental" instead.

@donpdonp

This idea is present to some degree in postgresql 9's streaming replication. The write-ahead-log is a transaction log (not sure if its append-only) that can be used to sync another database.

Defining a standard way to document changes to a restful resource, and semantics about ordering the records and how to do merges, would be fantastic. Couchdb has led the way with its http changes feed. /dogs/changes?since=tx_id

@maxogden
Owner

@donpdonp from what I understand postgres replicates it's data at a layer at least one step lower than a human readable format, so instead of passing serialized table rows between servers it is actually replicating the underlying data blocks. Is this still true in Postgres 9?

@donpdonp

I looked at this WAL Internals page http://www.postgresql.org/docs/current/static/wal-internals.html and it does imply that the unit of storage is a 'data page'. I think it can still pass as a 'journal' since changes are progressively written in response to any sort of change, but different in that pages are sent over the wire that contain the interested change and not for instance a SQL statement that caused the change. Thats an educated guess anyways.

@nichtich

How does SLEEP relate to ResourceSync?

@benoitc
benoitc commented Apr 16, 2014
 If you synchronize data from a remote source and then the remote source deletes a bunch of data, how do you find out what to sync when you want to synchronize again? The simplest solution is to keep enough history stored in your database so that you can produce an append-only transaction log of all data operations that remote sync clients can use to check for new changes.

maybe but what is history?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment