This is a work in progress for a new syn algorithm. I haven't decided on a name yet either, hence THISCODE
everywhere.
This is very rough
This discusses the sync algorithm and some design decisions. Note that all queries use DictTable, an in-memory, O(1) noSQL-like object store. Therefore they can be done efficiently.
Note that filename
is the path relative to the root.
The algorithm is summarized as follows:
Use rclone to download the older list. This is based on config name
so that multiple syncs can be set up. The file is saved in .THISCODE/{AB}-{name}_filelist
file. It is a simple json file listing that has been UTF8 encoded and run through zlib.compress
.
Use rclone to list files for both remotes for the current list. If config compare
or renames{A/B}
has hash
attribute, need to also get file hashes. Alternatively, if reuse_-_hashes{A/B}
, use size,mtime,filename
to get the previous hash value. Then call rclone again with --files-from
to get the hash of the remaining files.
Generate a list of all filenames in both current lists for A and B. For each file compare (filename
,size
,compare
attribute) between A and B:
- A == B:
- Remove from both current and previous lists
- Compare by filename and one of the following:
- hash (must have a common hash)
- mtime and size
- size alone
Do all of the above first. Then determine new, modified, and deleted. In practice, new and modified are treated the same but it is nice to have them separate for move tracking
- A is missing:
- if B is not in the previous list, B is new
- if B is in the previous list and UNMODIFIED, B has been deleted by A
- if B is in the previous list and MODIFIED, B has been deleted by A but modified by B
- B is missing:
- See above
- A != B: Resolve
- Resolution is based on what you set.
A
orB
are always those remotes.tag
is renaming both. Older, newer, newer_tag are based on mtime. However, if compare is hash or size, older means smaller and newer means larger.
- Resolution is based on what you set.
Moves are only tracked if renames{A/B}
is set. Options are {size,mtime,hash,inode}
where the following are checked:
Attribute | Actually compared |
---|---|
size | size |
mtime | size,mtime |
hash | hash |
inode | size,mtime,inode |
Note: to compute inodes, the remote must be a local remote and THISCODE will add them.
The move tracking algorithm only tracks if there have been no changes to the file (since unlike and rsync based tool, this one always does a full transfer).
For example, on side A, a rename is tracked if the following are all true:
- A file is marked as new on side A:
new
- The
renamesA
attribute is matched to aprev
file marked as delete on side A:old
- This is why the old list is pruned of matches to make sure they do not hang around on this one
- A file named
old
on B marked to be deleted matches the--compare
attribute
If all three conditions are met, the file new
is no longer considered for transfer, and the file old
on side B is marked to be moved.
This is all really an implementation detail but:
- Files marked as deleted are either delete
--no-backup
or moved to.THISCODE/backups/{date}-{name}/
- Files that are marked to be overwritten are also backed up to
.THISCODE/backups/{date}-{name}/
Then the transfers happen
If anything was changed or transfered to a side, a list is remade. The same hash process above is performed
- Perform a current listing of each remote
- Download the previous listing of each remote
- Remove (from both old and current) any files that have the same
filename
and--compare
attribute. Removing from the old list is