Skip to content

Instantly share code, notes, and snippets.

@fbeauchamp
Last active March 31, 2023 14:34
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save fbeauchamp/f1c8ac5925718380aef3bdc034c41aaf to your computer and use it in GitHub Desktop.
Save fbeauchamp/f1c8ac5925718380aef3bdc034c41aaf to your computer and use it in GitHub Desktop.

Backup data tiering

goal and definition

  • transfer backup to a relatively fast back to reduce the backup windows and limit the impact on the production VMs
  • then transfer (move or copy) this backup to a slower storage, eventually offsite

The competitors allow for backup to mulitple local storage in parallel -> s3 -> glacier

synchronous copy

iIt is the actual system : data are copied in all the remote, it does not use unnecessary space, read data only once, but only go as fast as the slower storage.

All the storages need the same space.

Race conditions are already handled : if a VM is still being backup when a new backup start, the new one is skipped

asynchronous copy

The transfer to the next tier should be done by the beginning of the next clean vm job that may modify the files

asynchronous move

  • the transfer to the next tier should be done by the beginning of the next backup job
  • the space of the faster storage should be dimensionned by taking into account the backup speed and the transfer speed to the next tier. It will be at most the size of a full backup job.

speed up

using an index of block on each remote will remove the necessity to list/transfer all blocks and speed up backup (transfer only new/modified blocks) and restoration (search block in the fastest remote available)

V2

  • from remote to remote
  • detect new backups
  • write all the files ( vhd, alias, block , bat, json, ... ) on the replicated remote
  • cleanvm on replicated remote ( must have a longer or equal retention than source remote )
  • can execute only once per remote
  • take a lock per vm on source ( maybe the read one is overkill, bu we shouldn't merge into or delete this ) and target
  • encrypted content is decrypted and then crypted if needed :
    • no end to end encryption if using data mover
    • each remote can have different encryption strategy and key
    • each remote can have its immutability , dedup strategy
  • can be offloaded to proxy

futures

  • filter by backup job, vm name, vm tag, full vm/full disk/delta ...
  • if changing filter leads to some vm not replicated anymore : user HAVE TO delete manually the replicated backups
  • chain backup jobs (data mover, healthcheck, remote level cleanVM...)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment