Skip to content

Instantly share code, notes, and snippets.

@Jwink3101
Created June 2, 2020 19:50
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Jwink3101/d0700cc15b64c481fdb7c5694d6009f7 to your computer and use it in GitHub Desktop.
Save Jwink3101/d0700cc15b64c481fdb7c5694d6009f7 to your computer and use it in GitHub Desktop.

Preface

This is a work in progress for a new syn algorithm. I haven't decided on a name yet either, hence THISCODE everywhere.

This is very rough

Bi-Direction Sync Algorithm

This discusses the sync algorithm and some design decisions. Note that all queries use DictTable, an in-memory, O(1) noSQL-like object store. Therefore they can be done efficiently.

Note that filename is the path relative to the root.

Summary:

The algorithm is summarized as follows:

File Listing

Use rclone to download the older list. This is based on config name so that multiple syncs can be set up. The file is saved in .THISCODE/{AB}-{name}_filelist file. It is a simple json file listing that has been UTF8 encoded and run through zlib.compress.

Use rclone to list files for both remotes for the current list. If config compare or renames{A/B} has hash attribute, need to also get file hashes. Alternatively, if reuse_-_hashes{A/B}, use size,mtime,filename to get the previous hash value. Then call rclone again with --files-from to get the hash of the remaining files.

Initial Comparison

Generate a list of all filenames in both current lists for A and B. For each file compare (filename,size,compare attribute) between A and B:

  • A == B:
    • Remove from both current and previous lists
    • Compare by filename and one of the following:
      • hash (must have a common hash)
      • mtime and size
      • size alone

Do all of the above first. Then determine new, modified, and deleted. In practice, new and modified are treated the same but it is nice to have them separate for move tracking

  • A is missing:
    • if B is not in the previous list, B is new
    • if B is in the previous list and UNMODIFIED, B has been deleted by A
    • if B is in the previous list and MODIFIED, B has been deleted by A but modified by B
  • B is missing:
    • See above
  • A != B: Resolve
    • Resolution is based on what you set. A or B are always those remotes. tag is renaming both. Older, newer, newer_tag are based on mtime. However, if compare is hash or size, older means smaller and newer means larger.

Move Tracking

Moves are only tracked if renames{A/B} is set. Options are {size,mtime,hash,inode} where the following are checked:

Attribute Actually compared
size size
mtime size,mtime
hash hash
inode size,mtime,inode

Note: to compute inodes, the remote must be a local remote and THISCODE will add them.

The move tracking algorithm only tracks if there have been no changes to the file (since unlike and rsync based tool, this one always does a full transfer).

For example, on side A, a rename is tracked if the following are all true:

  1. A file is marked as new on side A: new
  2. The renamesA attribute is matched to a prev file marked as delete on side A: old
    • This is why the old list is pruned of matches to make sure they do not hang around on this one
  3. A file named old on B marked to be deleted matches the --compare attribute

If all three conditions are met, the file new is no longer considered for transfer, and the file old on side B is marked to be moved.

Transfer & Update

This is all really an implementation detail but:

  • Files marked as deleted are either delete --no-backup or moved to .THISCODE/backups/{date}-{name}/
  • Files that are marked to be overwritten are also backed up to .THISCODE/backups/{date}-{name}/

Then the transfers happen

Relisting

If anything was changed or transfered to a side, a list is remade. The same hash process above is performed

  • Perform a current listing of each remote
  • Download the previous listing of each remote
  • Remove (from both old and current) any files that have the same filename and --compare attribute. Removing from the old list is
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment