fbeauchamp/Backup tiering.md

## Backup tiering.md

      
    Raw
  

              Backup tiering.md
            
          
    Backup data tiering

goal and definition


transfer backup to a relatively fast back to reduce the backup windows and limit the impact on the production VMs
then transfer (move or copy) this backup to a slower storage, eventually offsite

The competitors allow for backup to mulitple local storage in parallel -> s3 -> glacier
synchronous copy

iIt is the actual system : data are copied in all the remote, it does not use unnecessary space, read data only once, but only go as fast as the slower storage.
All the storages need the same space.
Race conditions are already handled : if a VM is still being backup when a new backup start, the new one is skipped
asynchronous copy

The transfer to the next tier should be done by the beginning of the next clean vm job that may modify the files
asynchronous move


the transfer to the next tier should be done by the beginning of the next backup job
the space of the faster storage should be dimensionned by taking into account the backup speed and the transfer speed to the next tier. It will be at most the size of a full backup job.

speed up

using an index of block on each remote will remove the necessity to list/transfer all blocks and speed up backup (transfer only new/modified blocks) and restoration (search block in the fastest remote available)
V2


from remote to remote
detect new backups
write all the files ( vhd, alias, block , bat, json, ... ) on the replicated remote
cleanvm on replicated remote ( must have a longer or equal retention than source remote )
can execute only once per remote
take a lock per vm on source ( maybe the read one is overkill, bu we shouldn't merge into or delete this ) and target
encrypted content is decrypted and then crypted if needed :

no end to end encryption if using data mover
each remote can have different encryption strategy and key
each remote can have its immutability , dedup strategy


can be offloaded to proxy

futures


filter by backup job, vm name, vm tag, full vm/full disk/delta  ...
if changing filter leads to some vm not replicated anymore : user HAVE TO delete manually the replicated backups
chain backup jobs (data mover, healthcheck, remote level cleanVM...)