Skip to content

Instantly share code, notes, and snippets.

@asmartin
Created February 21, 2016 19:03
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save asmartin/5ec56b3624c3956f1bdc to your computer and use it in GitHub Desktop.
Save asmartin/5ec56b3624c3956f1bdc to your computer and use it in GitHub Desktop.
DataGuard Backup
===DataGuard Backup===
==== The Problem ====
Most filesystems do not perform data checksumming to ensure that data on disk has not been corrupted (e.g by bad memory, gamma rays, etc). Using ECC RAM helps to reduce the amount of errors, but nevertheless the threat if silent corruption still exists [1]. ZFS is a better filesystem in that it performs internal data checksumming and handle disk-based corruption, but does not do well if memory is the source of corruption [2]. Therefore, to ensure data integrity, a secondary, external checksum must be performed.
It is best to use DataGuard Backup with with separate filesystems, at least one of which is ZFS. It also helps if one filesystem is "at rest" (unmounted) except when performing backups so that it is not suseptible to bit-flipping in memory except during the backup.
Note that to ensure this level of data integrity a speed compromise is required (due to all of the checksumming). If you want fast backups, use `rsync`. DataGuard Backup sacrifices speed in order to ensure data integrity.
==== Usage ====
```
dgbackup -s /path/to/src -d /path/to/dest
```
==== Algorithm ====
DataGuard Backup uses the following algorithm:
- for each file on destination
- if exists and mtime + size match:
- checksum src
- checksum dest
- if checksums differ, add to problem queue
- else if DNE or exists and mtime or size don't match: sync from src to dest
- else if does not exist on src: delete from dest
- if the problem queue is empty, exit with zero (success)
- else print out problem queue and exit with non-zero status
==== References ====
- 1 - https://www.cs.virginia.edu/~gurumurthi/papers/asplos15.pdf
- 2 - http://research.cs.wisc.edu/adsl/Publications/zfs-corruption-fast10.pdf
@arkag
Copy link

arkag commented Feb 22, 2016

So you want help with this? What language were you thinking?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment