Designing bookmark sync… again
For the second time, I'm building an implementation of bookmark sync on a new platform. The last one, in early 2012, was Firefox for Android. That turned out relatively well, considering the circumstances, but it didn't go far enough in avoiding some ancient errors baked into the desktop codebase.
For Firefox on iOS we have a new set of platform constraints and a strong desire to avoid some of the known pitfalls. This post sets out some of the considerations. A future post will set out the proposed solution.
Our goal: keep a tree of bookmarks in sync across multiple devices.
Syncs occur to and from a server. Devices don't communicate directly.
The server and protocol are record-oriented. All records are encrypted with a key known only to the clients.
Bookmark records are broadly of two types: folders and bookmarks. Each has a GUID for stable identification. Folders have a list of children (defining order), and bookmarks have a parent pointer. Both have other fields that you might expect — titles, URLs, etc. Pretty straightforward, right?
Here are our constraints.
Changes can occur on multiple devices between each sync. Those changes can conflict. Changes run the gamut from renaming a single bookmark through to deleting everything and reimporting an old backup, touching every record on the server.
Changes can involve: additions or removals of folders and bookmarks anywhere in the tree; additions of roots; ordering changes within folders; content changes; and movements of items. Multiple changes can occur to records at once: a bookmark can be moved and renamed, for example, resulting in a single updated record with two changed fields.
Neither the records themselves, nor the server, offer any kind of history/snapshots/versioning.
The server doesn't support transactions: both partial writes (a client uploads only some changed records) and interrupted reads (a client downloads only some of the changes) can occur. Even if the server did support transactions, and even if the client aborts on an interrupted read, it's possible (due to a client error or naive batching) for an inconsistent server state to exist for some period of time. The server doesn't (can't!) ensure that it only represents consistent states, because it can't decrypt the records to do so.
What's consistency here?
In short, it means that a given flat set of records maps to a well-formed tree of bookmarks. Well-formed means that each folder's list of children is present in the collection, each child is pointing to the right parent folder, and that each subtree terminates in a known base (e.g., the bookmarks toolbar). Each of those bases is conceptually the child of a fixed root, so we have a single tree.
When working incrementally — that is, fetching changes since we last synced — we have a supplemental definition of consistency: that applying the set of changed records to the previous well-formed tree yields another well-formed tree, with all items accounted for by addition, deletion, or change. For example, a changed record for a bookmark, changing it to point to a different parent, should be accompanied by two other changed records: the old and new parents, each with an updated list of children. Otherwise we don't know what order to put the bookmarks in. A bookmark shouldn't simply disappear from a folder's child list: it should either be replaced by a deletion record, or it should appear in a different folder. There are situations in which a bookmark can simply disappear — a wiped server, for example — but those are exceptions.
We have some other complications to avoid. The server can't return records in the ‘right’ order, whatever that is. The server can't validate contents: these records are individually encrypted. Neither (perhaps obviously) can we fetch subtrees: there's no such concept in the space of flat records.
The existing Firefox Sync clients are not good at detecting or maintaining bookmark consistency. They upload changes directly to the server in the order they come out of the database, so we typically don't preserve clusters of local changes. They download changes and apply them directly to the local bookmark database; they don't roll back on error, because they choose not to keep a transaction open for minutes. Because changes arrive one by one in arbitrary order, and are immediately applied to the database (which needs to back the UI, and thus needs to be consistent), record application involves a substantial amount of reparenting, rearranging, and even rooting new bookmarks in Unsorted Bookmarks until their parent folder shows up.
This is error prone and intrinsically flawed: a client seeing an inconsistent state will be forced to form its own novel consistent state (e.g., with orphan bookmarks in Unsorted Bookmarks), and that client will eventually leak that inconsistent state to other clients, resulting in bizarre reordering and duplication issues. Existing clients do not advance in lockstep between consistent states.
We want to avoid this problem.
The shape of a solution
The solution has a few necessary characteristics.
It has to involve batching: we can't directly apply records to the local store. Similarly, it has to be incremental: platform constraints mean we can't download 10,000 bookmarks in one go, but we also have to be able to wait a little while until a missing record shows up.
It has to involve some notion of consistency — not necessarily to always reject an inconsistent state, but to do so with judgment, carefully deciding when to instead make a change that restores consistency.
It will involve three-way merging and structural conflict resolution. Local and remote changes being encountered in a single sync are a fact of life, and we cannot apply Sync's traditional timestamp-based record-level conflict resolution to tree-structured data and expect to get anything but chaos.
(Imagine randomly picking resolutions on a per-file basis in your distributed version control system. It's extremely unlikely that the result will build.)
It has to have the ability to perform full GUID-based and content-based merges. Users are in the habit of following steps like "1. disconnect your account. 2. delete a bunch of folders. 3. import a backup file" that result in similar or identical trees with different GUIDs. We need to merge correctly in those cases.
Finally, it has to be pragmatic: it has to eventually reach consistency in the face of a broken server; it has to recover from server reassignments; it has to account for the behaviors and bugs of existing clients.