Skip to content

Instantly share code, notes, and snippets.

@nathansobo
Last active June 27, 2019 08:11
Show Gist options
  • Star 5 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save nathansobo/a15266a30ed433052a915605596c5ff4 to your computer and use it in GitHub Desktop.
Save nathansobo/a15266a30ed433052a915605596c5ff4 to your computer and use it in GitHub Desktop.
Intuition for `merge_op` in Raph Levein's GOTO implementation

When merging an operation O, there may be several operations scattered through the local history that are concurrent to O. We need to rearrange our history so that all of these concurrent operations are moved to the end of the history. Then we can transform O against each of these concurrent operations and append it to the end.

Say we're adding O a history with the following suffix. The operations surrounded by vertical bars are concurrent to O. Otherwise the operations causally precede O.

EO0 |EO1| EO2 EO3 |EO4| |EO5| EO6

We need to rearrange this history so it looks like this.

EO0 EO2' EO3' EO6' |EO1'| |EO4'| |EO5'|

Let C be the set of all operations concurrent with O. (EO1, EO4, and EO5 in the above example).

Let P be the set of all operations causally preceding (EO2, EO3, and EO6 in the above example).

The GOTO algorithm achieves the desired arrangement transposing all operations in set P leftward in the history until all members of set P land before members of set C.

Interestingly, we only actually need the transformed members of set C at the end of the history so we can transform O against them. We don't need the transformed members of set P. That leads us to the foundation of our intuition about Raph Levein's algorithm. All we really need to do is transform all the operations in set C to include the impact of operations in set P that follow them in the history.

So, in our example, in the rearranged history, |EO5| will need to be transformed to reflect the impact of EO6 being moved in front of it. |EO4| also needs to be transformed to reflect EO6, but it does not need to be transformed to reflect |EO5| because |EO5| still comes after it in the rearranged history. |EO1| will need to be transformed to reflect the impact of EO2, EO3 and EO6.

So, to repeat, all members of C (concurrent operations) will need to be transformed to include the effects all members of P (causally preceding operations) that follow them in the history.

Now we get into Raph Levein's algorithm.

Our goal is to build up a list of operations that we can use to transform O. The list we're building is the equivalent of |EO1'| |EO4'| |EO5'| in the above example.

We start at the end of the history and work backward. We only need to consider insertion operations, because thanks to the fact that we represent deletions with tombstones, deletions will have no impact on the coordinates of any other transformations.

As we move backward, we maintain two patches, S and T.

S includes the aggregate impact of all operations we have visited so far. T includes the aggregate impact of all operations we have visited so far that are concurrent to O.

Since we're visiting operations in the reverse order in which they were originally applied, each time we visit an operation we need to transform it against S so that we update it to reflect the impact of operations that followed it. For example, say there are two operations in the history, O1 followed by O2. O1 inserts a character at index 10, and then O2 inserts a character at index 5. O2's insertion at index 5 shifts the character inserted by O1 one column to the right, meaning that it actually now exists at index 11. When visiting these operations in reverse, we first visit O2 and record the insertion at index 5. Then we visit O1. Before proceeding, we perform a coordinate transform on O1's index using S, which shifts it from 10 to 11. This is exactly the effect that performing O2 after O1 had originally, but we're reversing the order. For each operation, we use S to transform it to reflect the impact of all operations that followed it in the original history.

If the operation we're visiting is concurrent to O, we have another step. We've just transformed its start index using S to reflect the impact all operations following it in the original history, but this isn't quite what we want. Instead, we only want to transform the visited operation to reflect the impact of following operations that causally precede O. Translating its position with S does too much by potentially including the impact of other concurrent operations. However, remember that we're also maintaining patch T, which includes the impact of all operations visited so far that are concurrent with O. To correct the problem, we translate the current operation's index into the input coordinate space of T, negating the impact of any concurrent operations that were present in S. Finally, we can unshift the transformed operation (with its translated index) onto the list of operations that we'll ultimately use to transform O. (Raph's algorithm actually pushes the operation to this list and then iterates it in reverse for efficiency).

So that's basically it. To summarize, as we walk the operations in reverse we build up two patches S and T. S represents every operation we've visited so far. T represents all the concurrent operations. We translate every operation's index against S to reflect its position in the rearranged history. For concurrent operations, we strip out the impact of other concurrent operations by translating the index in the opposite direction using T. In the end we will have transformed all concurrent operations against all causally preceding operations that follow them in the history, but done so in log-linear instead of quadratic time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment