Skip to content

Instantly share code, notes, and snippets.

@c5e3
Last active July 10, 2022 20:19
Show Gist options
  • Save c5e3/65991d6868e6175dccdb59dfc51d2908 to your computer and use it in GitHub Desktop.
Save c5e3/65991d6868e6175dccdb59dfc51d2908 to your computer and use it in GitHub Desktop.
#!/bin/bash
# restore with the following command:
# gunzip --stdout <filename>.img.gz | dd bs=10M of=/dev/sdX
# pv(1) is optional and provides a progress bar
# just remove it from the pipe, when you don't want it
DATE=$(date +"%Y-%m-%d_%H-%M")
GREEN='\033[1;32m'
RED='\033[1;31m'
NC='\033[0m'
if [[ "$1" == "-h" || "$1" == "--help" || "$1" == "" ]]; then
echo -e "simple dd backup tool by c5e3"
echo -e "must be run as root!"
echo -e "usage:"
echo -e "\tdd_bak.sh <sdX> <destination folder>"
exit 1
else
if [[ $EUID -ne 0 ]]; then
echo "try again as root" 1>&2
exit 1
fi
echo -ne "make backup from ${GREEN}/dev/$1${NC} to ${RED}$2${HOSTNAME}_$1_$DATE.img.gz${NC}? [y/n] "
read confirm
if [[ "$confirm" == "y" ]]; then
dd bs=10M if=/dev/$1 | pv | gzip > $2${HOSTNAME}_$DATE.img.gz
else
echo -e "${RED}no backup created!${NC}"
exit 1
fi
fi
@ruario
Copy link

ruario commented Jan 24, 2017

Assuming this is for critical data I see what I consider to be a significant flaw right off the bat. You should not use gzip compression around the disk image for important backups. Compressing images that are unimportant (or you that you have multiple copies of) is not a problem but doing so with a critical backup is a bad idea.

If a gzipped archive has a single bit corrupted near its start, the image stored within is effectively lost. This is because common compression algorithms depend on the coherency over a long sections of a file to achieve their results. If the file cannot be decompressed none of the image's contents can be extracted. Read this to see exactly what is involved in attempting to fix a corrupted gzip file.

One way to ease this problem is redundancy. You could for example use par2cmdline utils (an implementation of PAR2 partity file format) to make parity files to save alongside your compressed backup image. par2 files will allow you to fix a certain percentage of data corruption perfectly (you can define the level of redundancy). However, keep in mind that if you have a slightly higher percentage of corruption, your par2 files will be of no use at all. Consider factors such as the reliablity of the media you are saving to, to help you decide on an appropriate level or redundancy.

If you must have compression for critical backups I would suggest taking an entirely different approach and compressing files individually within the backup image because this means that only some of the files will be ruined if you should have any corruption. There are *nix archive formats that can do this including afio, xar or dar. You could still combine this with par2 files should you wish to decrease the risk.

Regarding compression choices, I'd also pick either bzip2 or lzip for compression because they have decent recovery tools available that may allow you to correct some level or corruption. lziprecover appears to be slightly more advanced than bzip2recover, additionally lzip uses LZMA-based compression (like XZ), so should result in the smallest files. However, only afio would allow you to select this compression format, so you might be limited to bzip2 if afio is not suitable for your needs. Consider a parallel bzip2 compressor like lbzip2 to get the best performance and you might also wish to lower the compression level as well, to get closer to the speed of gzip.

In summary, if you backup critical data, avoid compression and use parity files where possible (or some other form of redundancy, e.g. multiple backups) but if you must compress to save space consider an archive tool with internal compression rather than an externally compressed disk image or tar archive.

@ruario
Copy link

ruario commented Jan 24, 2017

Better yet would be to forget dd altogether and use another tool like rsync to copy to somewhere secure and reliable like a backup cloud service or another disk or partition that has a built-in fault-tolerance capabilities (e.g. using file systems like ZFS, Btrfs, HAMMER, etc.)? Any reason you need the disk image, rather than a straight copy?

Also if you do need a container format for some reason (e.g. you are saving this to a disk with a file system that does not retain *nix file persmissions), you should also consider using an archive format that has built in archive/backup features (dar has this for a bunch of interesting backup features). dd is useful utility but it is the wrong option for backup IMHO. It is a very low level, simple tool and not a proper, comprehensive backup program.

@ruario
Copy link

ruario commented Jan 24, 2017

Also consider that with dd since you are working at such a low level you need to copy each disk individually. A higher level backup utility or copy would work with mount points seemlessly.

@c5e3
Copy link
Author

c5e3 commented Jan 24, 2017

wow, thanks for that very detailed explenation! the reason for using dd is, as you already guessed, a dualboot system with windows as a secondary OS. i used acronis for a long time, but last year i had troubles restoring my ext4 partitions, aside of being proprietary anyway. since dd images are a common solution for sd card images on single board computers, i tried it this way. if i had a dual boot system with two hard drives, i would backup them both in their optimal way, but since i'm using a laptop, this is not possible.
however, i think i will go for clonezilla instead of trying reinventing the wheel.

@ibukanov
Copy link

@ruario One can use gzip --rsyncable that not only makes the compressed image rsync-friendly but also limits the damage only to 32K .

@ruario
Copy link

ruario commented Jan 25, 2017

@c5e3 Thanks for taking the feedback the right way. I did worry latter that my wall of text was perhaps too much. It really depends on the importance of your data, your perception of the preceived risk and your usage (e.g you do need a container format because you store the result on a Windows partition). With a combination of @ibukanov's suggested switch (assuming it is available to you) and parity files, the solution might be good enough for your own use case. With any backup method the key would be to regularly test that it recovery indeed work.

@ibukanov nice tip. I was not aware of this switch and in fact it is not mentioned in the man pages on Slackware or OpenSUSE, nor is it present on macOS. I also see some reports of users on Gentoo and Arch not having it. It seems that the presence of the --rsyncable option depends on whether the distribution has applied the patch that adds the option, since it is not part of upstream.

I would like to know why it hasn't been adopted by upstream (does it cause problems?). Also I would personally be a little reticent to use or promote it in a public script when it might not be readily available but for personal usage, assuming it is available on your distro it could be another option.

EDIT: The --rsyncable option was upstreamed with GNU gzip 1.7. However, with many distros still using older versions and patching of this feature in older versions being inconsistant, you still cannot reliably know --rsyncable will always be available to everyone who might try this script.

@ruario
Copy link

ruario commented Jan 25, 2017

I have just discovered that pigz has --rsyncable built-in (from upstream, without a patch) and other similar switches such as --independent. In addition since pigz compresses using threads to make use of multiple processors and cores, you will most likely get far better performance. Finally pigz was written by Mark Adler (a co-author of the zlib compression library and gzip) so I would be fairly confident in its reliability.

So if you are planning to continue with a gzipped image or archive, pigz with appropriate switch(es), might be a better option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment