Skip to content

Instantly share code, notes, and snippets.

@rnewman
Created December 5, 2015 01:58
Show Gist options
  • Save rnewman/82c17460161047db38d5 to your computer and use it in GitHub Desktop.
Save rnewman/82c17460161047db38d5 to your computer and use it in GitHub Desktop.
Bookmark Sync 2

Implementing bookmarks sync… again

Since our first installment we've implemented batched downloading for bookmarks: Bug 1201108.

This incrementally populates a local snapshot of the server (initially termed the ‘mirror’, now more accurately renamed the ‘buffer’), validating each record and fitting the contents into a relational schema. As it processes folders it also fills a table with the top-down structure of the downloaded records. We can be confident that when the downloader finishes, we'll have everything the server currently contains, minus any records that aren't well-formed.

What's next?

We're not done with schema changes: Bug 1201110. The buffer is a blind and forgetful one: it truly represents the current state of the server. We need more than that to sync — in order to find out what changed on each end since we last synced we need some history, a shared parent from which to extract changes.

When we built passwords sync, we used two tables: ‘local’ and ‘mirror’. The mirror was only updated when a local change was successfully applied to the server or a remote change was successfully merged locally. That's OK: password records are independent, so at the point of downloading a change we have the old server record (mirror), the current local record, and the new server record — all three parts of a three-way merge.

The incremental downloader we just built for bookmarks must evolve to a similar point before it can be used for bidirectional syncing: we need a way to separate out batches of changes that were just downloaded and haven't been applied, bookmarks that we've previously downloaded and applied and haven't changed since, and bookmarks that are new or changed locally.

(And of course we need these structures to be queryable and to display them in the UI.)

We need to do three-way merging (primarily structural, but also for fields like title and URL), so we need to introduce some concept of a shared parent that's separate from the download batches.

Let's do that.

We define three pairs of tables:

  • mirror: one row per bookmark or folder; fields like URL, title, icon.

  • mirrorStructure: one row for each child of a folder.

  • buffer: records downloaded since the last complete merge.

  • bufferStructure: one row for each child of each folder downloaded since the last complete merge.

  • local: records locally changed or added since the last complete merge.

  • localStructure: one row for each child of any folder modified or added since the last complete merge.

(Henceforth I'll refer to these pairs as mirror, buffer, local, explicitly referring to the named tables as mirror/mirrorStructure etc.)

These three pairs form two ‘views’: overlaying the buffer on the mirror gives us the current state of the server, and overlaying local on the mirror gives us the user's perspective of their bookmarks on this device. Records transition from one of buffer or local into mirror when created. Records transition from mirror into local when modified, and back again when the change is merged and uploaded.

We can define some invariants and rules:

  1. A new client has empty mirror, empty buffer, populated local.
  2. Immediately prior to the merge phase of an initial sync, buffer reflects the entire contents of the server, and mirror is empty.
  3. If overlay(buffer, mirror) is consistent, immediately after a full successful sync both buffer and local are empty, and mirror reflects the current contents of the server to the best of the client's knowledge.
  4. Downloading a folder record will add or update a row in buffer and 0 or more rows in bufferStructure.
  5. Modifying a local folder or changing its child array (reordering, adding, or removing children) results in a CHANGED row in local for the folder and 0 or more rows in localStructure. Any modifications to a record that exists in the mirror but hasn't already been modified results in that record being cloned to both local and, in the case of folders, localStructure.
  6. Deleting a folder requires its children to be moved or deleted, recursively.
@garvankeeley
Copy link

garvankeeley commented Aug 1, 2017

Not clear on the structure paired table terminology. The non-structure table stores bookmarks and folders, one per-row. And these items can have a parent (in fact most will), and thus children of a folder. Yet, structure tables store children of a folder, one per row. That reads the as the same thing to me.

I can intuit that structure is in fact a structural representation of the hierarchy, but the non-structure tables also have the hierarchy info on each row (I am assuming that a item in a row has a field for 'parent').

@garvankeeley
Copy link

I am having trouble groking why the buffer isn't a temporary thing, as records arrive there, then need to go into the mirror. However, temporary or not, I see why the duration of time data is in that table is largely irrelevant to code that uses this API --it should not assume the buffer is reconciled with the mirror.

@rnewman
Copy link
Author

rnewman commented Aug 1, 2017

Re buffer:

The buffer is a temporary thing, if merging is enabled — records arrive there, and as soon as the buffer is internally consistent, we walk all three trees to produce a new final state, updating the server and our mirror appropriately, and clearing both buffer and local.

The buffer is right now not all that temporary, because merging is disabled, because desktop and Android aren't confident in being able to maintain consistent server data.

@rnewman
Copy link
Author

rnewman commented Aug 1, 2017

Re structure:

There are two ways that Sync bookmark records report structure: folders know their children, in order, and all records know their parent (but not their position in the list).

We store both of those things, for several reasons:

  • It allows us to find the ordered sequence of children of a folder very cheaply.
  • It allows us to represent (and detect) inconsistent server states in which the parent and the child disagree.
  • It allows us to represent empty folders more explicitly: if you locally delete all of the contents of a folder, you'll have a locally overridden copy of the folder in the value table and it will have no child rows in the structure table.
  • It allows us to more efficiently alter the children of a folder compared to approaches that put an idx column on each record: we only need to touch the structure table, not the main rows. (Consider how you'd represent the deletion of a single child without overriding all siblings, and detect that only two records need to be uploaded to the server.)

We use the structure table as the source of truth for bookmark structure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment