Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@bajtos
Last active September 27, 2016 22:06
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save bajtos/63b415629a341ef5d0c9 to your computer and use it in GitHub Desktop.
Save bajtos/63b415629a341ef5d0c9 to your computer and use it in GitHub Desktop.
LoopBack Replication & Offline sync

The offline data access & synchronization is built from three components:

  1. Change tracking
  2. Replication of changes
  3. Browser version of LoopBack

Setup

Enable change tracking

For each model in that you would like to access in an offline mode, you need to enable change tracking:

  1. Set the option "trackChanges" to true.
  2. Change the id property to an auto-generated GUID
  3. Enable "strict" and "persistUndefinedAsNull" flags.

Example - common/models/todo.js

{
  "name": "Todo",
  "base": "PersistedModel",
  "strict": "throw",
  "persistUndefinedAsNull": true,
  "trackChanges": true,
  "id": {
    "id": true,
    "type": "string",
    "defaultFn": "guid"
  },
  "title": {
    "type": "string",
    "required": true
  },
  "description": "string"
}

For each change-tracked models, a new model (database table) is created to keep change tracking records. In the example above, a Todo-Change model will be created. The change model is attached to the same data-source as the model being tracked.

Create a client app

Create another LoopBack app that will be used by the clients. For each replicated model, create a client-only subclass that will use local storage to persist the changes offline. The original model will be connected to the serer and used as a target for replication.

Example - client/models/local-todo.json:

{
  "name": "LocalTodo",
  "base": "Todo"
}

Example - client/model-config.json:

{
  "_meta": {
    "sources": ["../../common/models", "./models"]
  },
  "Todo": {
    "dataSource": "remote"
  },
  "LocalTodo": {
    "dataSource": "local"
  },
}

Configuration of client datasources - client/datasources.json:

{
  "remote": {
    "connector": "remote",
    "url": "/api"
  },
  "local": {
    "connector": "memory",
    "localStorage": "todo-db"
  }
}

Now that you have all models in place, you can setup a bi-directional replication between LocalTodo and Todo, for example in a boot script.

// client/boot/replication.js
module.exports = function(client) {
  var LocalTodo = client.models.LocalTodo;
  var RemoteTodo = client.models.Todo;

  var since = { push: -1, pull: -1 };

  function sync() {
    // It is important to push local changes first,
    // that way any conflicts are resolved at the client
    LocalTodo.replicate(
      RemoteTodo,
      since.push,
      function pushed(err, conflicts, cps) {
        // handle err and conflicts
        since.push = cps;

        RemoteTodo.replicate(
          LocalTodo,
          since.pull,
          function pulled(err, conflicts, cps) {
            // handle err and conflicts
            since.pull = cps;
          });
      });
  }

  LocalTodo.observe('after save', function(ctx, next) {
    next();
    sync(); // in background
  });

  LocalTodo.observe('after delete', function(ctx, next) {
    next();
    sync(); // in background
  });
};

Run the client app in the browser

The module loopback-boot provides a build tool for adding all application metadata and model files to a browserify bundle. Browserify is a tool that packages Node.js scripts into a single file (bundle) that runs in a browser.

Below is a simplified example packaging the client application into a browser "module" that can be loaded via require('lbclient'). Consult build.js in loopback-example-full-stack for a full implementation that includes source-maps and error handling.

Example - client/build.js

var b = browserify({ basedir: __dirname });
b.require('./client.js', { expose: 'lbclient '});

boot.compileToBrowserify({ appRootDir: __dirname }, b);

var bundlePath = path.resolve(__dirname, 'browser.bundle.js');
b.pipe(fs.createWriteStream(bundlePath));

Understanding the replication

(This is based on https://gist.github.com/BerkeleyTrue/f960946ba72814006651)

Change model

As explained above, a new change model is created for each change-tracked model, e.g. Todo-Change. This model can be accessed using the method getChangeModel, e.g. Todo.getChangeModel().

The change model has several properties:

  • modelId links a change instance (record) with a tracked model instance
  • prev and rev are hash values generated from the model class the Change model is representing. The rev property stands for Revision, while prev is the hash of the previous revision. When a model instance is deleted, the value null is used instead of a hash.
  • checkpoint associates a change record with a Checkpoint, more on this later.

Additionally, there is a method type() that can be used to determine the kind of change being made: Change.CREATE, Change.UPDATE, Change.DELETE or Change.UNKNOWN.

The current implementation of the change tracking algorithm keeps only one change record for each model instance - the last change made.

Checkpoints

A checkpoint represents a point in time which can be used to filter the changes to only those made after the checkpoint was created. A checkpoint is typically created whenever a replication is performed, this allows subsequent replication runs to ignore changes that were already replicated.

While in theory the replication algorithm should work without checkpoints, in practice it's important to use correct checkpoint values because the current implementation keeps the last change only.

If you don't pass correct values in the since argument of replicate method, then:

  • You may get false conflicts if the "since" value is omitted or points to an older, already replicated checkpoint.

  • You may incorrectly override newer changes with old data if the "since" value points to a future checkpoint that was not replicated yet.

Replication algorithm

A single iteration of the replication algorithm consists of the following steps:

  1. Create new checkpoints (both source and target)
  2. Get list of changes made at the source since the given source checkpoint
  3. Find out differences between source and target changes since the given target checkpoint, detect any conflicts.
  4. Create a set of instructions - what to change at target
  5. Perform a "bulk update" operation using these instructions
  6. Return the new checkpoints to the callback

It is important to create the new checkpoints as the first step of the replication algorithm. Otherwise any changes made while the replication is in progress would be associated with the checkpoint being replicated, and thus they would not be picked up by the next replication run.

The consequence is that the "bulk update" operation will associate replicated changes with the new checkpoint, and thus these changes will be considered during the next replication run, which may cause false conflicts.

In order to prevent this problem, the method replicate runs several iterations of the replication algorithm, until either there is nothing left to replicate, or a maximum number of iterations is reached.

Conflict detection

Conflicts are detected in the third step. The list of source changes are sent to the target model, which compares them to change made to target model instances. Whenever both source and target modified the same model instance (the same model id), the algorithm checks the current and previous revision of both source and target models to decide whether there is a conflict.

A conflict is reported when both of these conditions are met:

  • The current revisions are different, i.e. the model instances have different property values.

  • The current target revision is different from the previous source revision. In other words, if the source change is in sequence after the target change, then there is no conflict.

Bulk update

The bulk update operation expects a list of instructions - changes to perform. Each instructions contains a Change instance describing the change, a change type, and model data to use.

In order to prevent race conditions when third parties are modifying the replicated instances while the replication is in progress, the bulkUpdate function is implementing a robust checks to ensure it modifies only those model instances that have their expected revision.

The "diff" step returns the current target revision of each model instances that needs an update, this revision is stored as the change.rev property.

The "bulkUpdate" method loads the model instance from the database, verifies that the current revision matched the expected revision in the instruction, and then performs a conditional update/delete specifying all model properties as the condition.

// Example: apply an update of an existing instance
var current = findById(data.id);
if (revisionOf(current) != expectedRev)
  return conflict();
var c = Model.updateAll(current, data);
if (c != 1) conflict();

Known issues

  • The size of the browser bundle is over 1.4MB, which is too large for mobile clients.

    strongloop/loopback#989

  • It's not possible to set a model property to undefined via the replication. When a property is undefined at the source but defined at the target, "bulk update" will not undefine it at the target. This can be mitigated by using strict model and using null values for unset properties.

    strongloop/loopback#1215

  • Browser's localStorage is limiting the size of stored data to about 5MB (depending on the browser). If your application needs to store more data in the offline mode, then you need to use IndexedDB instead of localStorage. LoopBack does not provide a connector for IndexedDB yet.

    strongloop/loopback#858

  • Not all connectors were yet updated to report the number of rows affected by updateAll and deleteAll, which is needed by "bulkUpdate". As a result, the replication fails when the target model is persisted using one of these unsupported connectors.

    TODO: provide a list of supported connectors

@ritch
Copy link

ritch commented Mar 24, 2015

Read through this and couldn't find any glaring issues.

I think we need to bold some important parts to draw attention (this is a lot of material).

@superkhau
Copy link

The documentation need a few data flow diagrams showing the direction of synchronization. Source and target are pretty generic terms unless you write up the whole article from one perspective (ie. state up front the source is always the client and the target is always the server) or explain from which the perspective the explanation is from (ie. from the perspective of the client, the source/client sends the data to the server/target, etc).

We also need more clarification on things like since, cps, delta, the structure of the conflicts array, etc.

Other than some details and more in depth descriptions, it looks good overall.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment