Skip to content

Instantly share code, notes, and snippets.

@orionz
Last active July 31, 2022 21:07
Show Gist options
  • Save orionz/aaf6a4eedd9974513374fb158663d06b to your computer and use it in GitHub Desktop.
Save orionz/aaf6a4eedd9974513374fb158663d06b to your computer and use it in GitHub Desktop.
Automerge Truncate Proposal

Automerge Truncate/Snapshot Proposal

Frequently it is desired to truncate a document history either to save space or to remove sensitive data from the edit history.  Currently the only way to do this is to create an entirely new document.  This prevents the document from being mergible with its peers (who may not wish to remove the edit history) and also removes any record of what actors are responsible for the current state of the document.  An ideal mechanism would allow each peer to individually choose how and when to truncate without requiring coordination with syncing peers.  The downsides to truncation are twofold.  One is that changes predating the moment of truncation cannot be merged and the hash chain history would be lost meaning the auditing the authenticity of changes could only happen from the moment of truncation forward.  This would also necessitate several new error states to correctly communicate to the end user what is going on.

API Changes

   doc.truncate();       // truncate the doc at the current heads
   doc.truncate(heads, "time to publish", new Date(0));
                         // rather truncate the doc at some previous point in its history,
                         // with a message and a timestamp
   doc.truncate([]);     // this effectively zeroes out a document
   doc.getSnapshot();  // returns the heads the document was truncated at, 
                         // the document roots, the vector clock at the time of truncation,
                         // and time and message of truncation 

New change type

This introduces a new primitive change with its own columnar format, the snapshot (alternate possible name would be checkpoint).  The snapshot is identical to a document except it has a chunk type of SNAPSHOT (3), and includes the root and head hashes at the point of snapshotting, the time of snapshotting, and the subset of the vector clock with surviving ops (actors who have all ops truncated will be dropped).  The heads need to be included because they cannot be computed with this limited dataset and are needed to compute the correct hashes going forward.

New behaviors

  doc.generateSyncMessage(peer)    // will now communicate roots to the peer
  
  doc.receiveSyncMessage(peer,msg) // will now error 'Documents do not have a common history`
                                   // if they do have a common history but the sync message offers changes that
                                   // merge before truncation
                                   // it will error `Cannot merge past snapshot boundary`
                                   
  doc.getChangesAdded(other);      // will now throw an error `Documents do not have a common history` if the roots
                                   // are not the same, or `Cannot merge past snapshot boundary`
                                   // if there is a concurrent snapshot.
                                   
  doc.loadIncremental(data);       // these functions are unchanged as it has now way to know about roots
  doc.applyChanges(changes)        // By comparing the change sequence to the snapshot clock we can determine 
                                   // if these changes should be processed, ignored, or throw an error

Notes

When syncing documents (either locally or over the network) the snapshot is never shared with a peer unless they are starting from zero.  Peers needing data in the snapshot but not starting from zero have concurrent changes and will see an error.

With the current API it is impossible to insert after a deleted opid so I believe we should make that part of the spec.  This allows us to keep the original elemids for list elements even though the elem they refer to is missing and list order is simply enforced by snapshot op order as all ops are coming from the same change(snapshot).  

All increments to counters are thrown out and their values are merged into the referenced counter.

Methods that return changes (including sync messages) that cover the snapshot would include the snapshot as the first normal change

  doc.getChanges([])   // returns [ snapshot, change, change, change, ... ]
  doc.save()           // returns one big snapshot including all changes that come after it

If a snapshot error is thrown the developer can choose to not sync with that document or reset the local document and sync the whole history (and possibly truncate again after syncing).

As long as you have no changes concurrent with the snapshot, syncing with a peer with the full history or with a truncated history should be identical.

Optional Changes

This might be a good time to create the restriction that all documents must have a single root.  I think adding this is a good idea and will change the implementation of snapshotting.

It should be possible to make a rebase tool down the road.  This would look at unmergable changes and create a new mergeable patch based off of it.  Sets to missing objects are thrown out.  Inserts to truncated ops are replaced with inserts to the first visible op preceding it.  Increments to counters are unchanged.  Deletes to truncated items are dropped.  

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment