Since our first installment we've implemented batched downloading for bookmarks: Bug 1201108.
This incrementally populates a local snapshot of the server (initially termed the ‘mirror’, now more accurately renamed the ‘buffer’), validating each record and fitting the contents into a relational schema. As it processes folders it also fills a table with the top-down structure of the downloaded records. We can be confident that when the downloader finishes, we'll have everything the server currently contains, minus any records that aren't well-formed.
What's next?
We're not done with schema changes: Bug 1201110. The buffer is a blind and forgetful one: it truly represents the current state of the server. We need more than that to sync — in order to find out what changed on each end since we last synced we need some history, a shared parent from which to extract changes.
When we built passwords sync, we used two tables: ‘local’ and ‘mirror’. The mirror was only updated when a local change was successfully applied to the server or a remote change was successfully merged locally. That's OK: password records are independent, so at the point of downloading a change we have the old server record (mirror), the current local record, and the new server record — all three parts of a three-way merge.
The incremental downloader we just built for bookmarks must evolve to a similar point before it can be used for bidirectional syncing: we need a way to separate out batches of changes that were just downloaded and haven't been applied, bookmarks that we've previously downloaded and applied and haven't changed since, and bookmarks that are new or changed locally.
(And of course we need these structures to be queryable and to display them in the UI.)
We need to do three-way merging (primarily structural, but also for fields like title and URL), so we need to introduce some concept of a shared parent that's separate from the download batches.
Let's do that.
We define three pairs of tables:
-
mirror
: one row per bookmark or folder; fields like URL, title, icon. -
mirrorStructure
: one row for each child of a folder. -
buffer
: records downloaded since the last complete merge. -
bufferStructure
: one row for each child of each folder downloaded since the last complete merge. -
local
: records locally changed or added since the last complete merge. -
localStructure
: one row for each child of any folder modified or added since the last complete merge.
(Henceforth I'll refer to these pairs as mirror, buffer, local, explicitly referring to the named tables as mirror
/mirrorStructure
etc.)
These three pairs form two ‘views’: overlaying the buffer on the mirror gives us the current state of the server, and overlaying local on the mirror gives us the user's perspective of their bookmarks on this device. Records transition from one of buffer or local into mirror when created. Records transition from mirror into local when modified, and back again when the change is merged and uploaded.
We can define some invariants and rules:
- A new client has empty mirror, empty buffer, populated local.
- Immediately prior to the merge phase of an initial sync, buffer reflects the entire contents of the server, and mirror is empty.
- If
overlay(buffer, mirror)
is consistent, immediately after a full successful sync both buffer and local are empty, and mirror reflects the current contents of the server to the best of the client's knowledge. - Downloading a folder record will add or update a row in
buffer
and 0 or more rows inbufferStructure
. - Modifying a local folder or changing its child array (reordering, adding, or removing children) results in a
CHANGED
row inlocal
for the folder and 0 or more rows inlocalStructure
. Any modifications to a record that exists in the mirror but hasn't already been modified results in that record being cloned to bothlocal
and, in the case of folders,localStructure
. - Deleting a folder requires its children to be moved or deleted, recursively.
Not clear on the
structure
paired table terminology. The non-structure table stores bookmarks and folders, one per-row. And these items can have a parent (in fact most will), and thus children of a folder. Yet,structure
tables store children of a folder, one per row. That reads the as the same thing to me.I can intuit that structure is in fact a structural representation of the hierarchy, but the non-structure tables also have the hierarchy info on each row (I am assuming that a item in a row has a field for 'parent').