Last active
November 6, 2017 02:17
-
-
Save edt11x/64e2dc0e67c5756432dc7136d53dda1d to your computer and use it in GitHub Desktop.
Archive files or directories in a way that works pretty well across Mac, Linux and Windows
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
# bail out on any failure, we do not want a bad archive | |
set -e | |
die() { | |
set +x | |
echo >&2 "$@" | |
usage | |
exit 1 | |
} | |
function usage { | |
set +x | |
echo "Usage: archivedirectory [-d dest_dir] [-e] [-n] dir_or_file_to_archive" | |
exit 1 | |
} | |
function man { | |
set +x | |
cat << MANUAL_PAGE | |
NAME | |
archivedirectory - archive the specified directory or file | |
SYNOPSIS | |
archivedirectory [-d destination_dir] [-e] [-n] dir_or_file_to_archive | |
-d directory -- specify the destination dir to place the archive | |
-e -- encrypt the archive, otherwise just tar and forward ECC | |
-n -- nice the processes, some processes are intensive | |
-p prefix_name -- add a prefix name to the archive | |
DESCRIPTION | |
This function archives a directory or file. Goals for this tool: | |
* Create archives that can be extracted across Windows, Mac and Linux | |
* Encrypt the data at rest | |
* Provide some level of corruption resistance | |
* Use widely supported, cross platform tools | |
** Create archives that can be extracted across Windows, Mac and Linux ** | |
I am mostly a Mac and Linux user, but at work we develop under Windows and | |
Linux, I want an archiving strategy that will work on all three platforms. | |
** Encrypt the data at rest ** | |
Everybody is getting breached. All data should be encrypted at rest giving a | |
better chance of not being exposed. | |
** Provide some level of corruption resistance ** | |
Bitrot, copy errors are real problems, we need a way to guard and test against | |
corruption. More commonly, I experience a never ending battle with tools and | |
utilities that want to convert line endings from Windows to Linux format, or case | |
insensitive file systems that cause problems with case sensitive file names. | |
Another one is symbolic links. Windows has three different mechanisms for | |
symbolic links, dot lnk files, junctions and NTFS symbolic links. Cygwin which | |
runs on Windows, uses none of these, but choses to invent its own. Subversion | |
on Linux will correctly save and extract symbolic links, but on Windows invents | |
its own format, which is none of the three formats that Windows supports, nor | |
the Cygwin on Windows format. When I move archives from Windows to Linux or vice | |
versa, I do not want files renamed, converted, etc. | |
** Tool choices | |
The Zip file format is largely portable across Windows, Mac and Linux, but I | |
have had problems extracting complicated zip archives on Linux, finding that I | |
could not extract some directories or files. Zip files abillity to correctly | |
restore file attributes like ownership and mode is also less supported. WinZip | |
on the other hand does understand tar files and tar is native to Mac and Linux | |
systems. Tar is the chosen common format. | |
With all the malware, comprimises, etc. I think all data needs to be encrypted | |
at rest. You can not know if a computer or network has been comprimised. The | |
F-35, OPM, and Anthem data breaches are all good examples for me personally. | |
You have to assume that all networks are comprimised. Also, people lose thumb | |
drives and computers all the time. GPG is widely available and known for good | |
encryption and is available across Windows, Mac, and Linux. | |
For corruption resistance, I have a large number of computers that I deal with | |
and two that are actively used, one Windows 7 and one Linux, are corrupting | |
sectors at a very low rate, low enough on the Windows 7 box that the | |
sysadmins do not care. Yet blocks in the middle of large text log files are | |
being replaced with all 0xFFs. Based on experience, it is prudent to assume | |
any file system could be corrupting files. To address this forward error | |
correction is done after archiving, compressing and encrypting. For forward | |
error correction, I used the the par2 command line tool. It is well | |
understood and works across Windows, Mac and Linux. | |
The main tools that this script uses are "tar", "xz" (selecting maximum | |
compression), "gpg" and "par2". | |
This shell script bails out if any command returns a failure, the bash "-e" | |
option. If I am creating an encrypted archive, I want to know all steps were | |
successful. | |
EXAMPLES | |
$ archivedirectory -d /mybackups -e ./directory_or_file_to_archive | |
This will create a time and date stamped directory in /mybackups that includes | |
the original name, in that directory will be the gpg encrypted compressed tar | |
file and the par2 recovery files. par2 is set at 100 percent. This script will | |
append to a file called "archive_list" in the chosen backup directory, | |
/mybackups in this example a line with the name of the encrypted archive and | |
the password. The user needs to manually backup these passwords in a safe | |
place, such as a password vault. | |
Most often, I just archive with encryption, and then remove the original | |
directory: | |
$ archivedirectory -e some_directory_to_archive | |
$ rm -rf some_directory_to_archive | |
I can periodically check the backup directory via cron with | |
$ find . -type d -exec bash -c "cd \\"{}\\" && pwd && sh -c | |
\\"par2 verify -qq *.gpg\\"" \; | |
MANUAL_PAGE | |
exit 1 | |
} | |
unset DISPLAY | |
export ENCRYPT_ARCHIVE=0 | |
export ARCHIVE_TO="$HOME/files/backups" | |
export NICE="" | |
export ARCHIVE_PREFIX="" | |
echo | |
# get the arguments | |
while [[ $# > 1 ]] | |
do | |
key="$1" | |
case $key in | |
-d|--directory) | |
ARCHIVE_TO="$2" | |
echo "Resulting archive will be stored in the directory $2" | |
echo | |
shift # past argument | |
;; | |
-e|--encrypt) | |
ENCRYPT_ARCHIVE=1 | |
echo "Archive will be encrypted" | |
;; | |
-n|--nice) | |
NICE=nice | |
echo "Running nice on processes" | |
;; | |
-p|--prefix) | |
# add an underscore to the prefix for clarity | |
ARCHIVE_PREFIX="$2""_" | |
echo "Adding prefix $ARCHIVE_PREFIX to the archive name" | |
echo | |
shift # past argument | |
;; | |
-h|--help) | |
man | |
;; | |
*) | |
die "Unknown option $key" | |
;; | |
esac | |
shift; | |
done | |
[ $# -ge 1 ] || die "Need to specify a directory or file to archive" | |
[ -e "$1" ] || die "Need to specify a directory or file to archive, $1 does not exist" | |
export HOSTNAME | |
if [ x"$HOSTNAME" == x ] | |
then | |
export HOSTNAME=`hostname -s 2> /dev/null` | |
if [ x"$HOSTNAME" == x ] | |
then | |
export HOSTNAME=`hostname 2> /dev/null` | |
if [ x"$HOSTNAME" == x ] | |
then | |
export HOSTNAME=`uname -n 2> /dev/null` | |
if [ x"$HOSTNAME" == x ] | |
then | |
HOSTNAME=unknown | |
fi | |
fi | |
fi | |
fi | |
export HOSTNAME | |
# remove any trailing slash | |
export TMP_FROM="${1%/}" | |
# remove ./ prefix if it exists | |
export ARCHIVE_FROM="${TMP_FROM#./}" | |
export ARCHIVE_BASE="$ARCHIVE_PREFIX""$ARCHIVE_FROM"'_'"$HOSTNAME"'_'`date +'%y%m%d_%H.%M.%S'` | |
export ARCHIVE_NAME="$ARCHIVE_BASE"'.tar.xz' | |
# The full path to the directory where the archive will be built | |
export ARCHIVE_TO_DIR="$ARCHIVE_TO/$ARCHIVE_BASE" | |
# The full path to the GPG archive if encrypted or just the tarball if not encrypted | |
export ARCHIVE_FULL_PATH="$ARCHIVE_TO_DIR/$ARCHIVE_NAME" | |
echo "Creating the archive from : $ARCHIVE_FROM" | |
echo "Archive directory will be : $ARCHIVE_TO_DIR" | |
if [ $ENCRYPT_ARCHIVE -eq 1 ] | |
then | |
export ARCHIVE_LIST="$ARCHIVE_TO/archive_list" | |
export ARCHIVE_GPG="$ARCHIVE_FULL_PATH"'.gpg' | |
export ARCHIVE_PAR2="$ARCHIVE_GPG"'.par2' | |
export ARCHIVE_PASSWD=`perl -e'my @set = ('"'"'0'"'"' .. '"'"'9'"'"', '"'"'A'"'"' .. '"'"'Z'"'"', '"'"'a'"'"' .. '"'"'z'"'"'); my \$passwd = join '"'"''"'"' => map \$set[rand @set], 1 .. 63; print \$passwd;'` | |
echo "Full path to encrypted archive : $ARCHIVE_GPG" | |
echo "Password will be appended to : $ARCHIVE_LIST" | |
echo "Archive password will be : $ARCHIVE_PASSWD" | |
else | |
export ARCHIVE_PAR2="$ARCHIVE_FULL_PATH"'.par2' | |
fi | |
echo | |
echo | |
sleep 5 | |
set -x | |
mkdir -p "$ARCHIVE_TO" | |
[ -d "$ARCHIVE_TO" ] || die "Directory to archive to does not exist" | |
mkdir -p "$ARCHIVE_TO_DIR" | |
[ -d "$ARCHIVE_TO_DIR" ] || die "Can not create $ARCHIVE_TO_DIR" | |
[ ! -f "$ARCHIVE_FULL_PATH" ] || die "Archive $ARCHIVE_FULL_PATH already exists" | |
[ ! -f "$ARCHIVE_PAR2" ] || die "Archive PAR2 $ARCHIVE_PAR2 already exists" | |
$NICE tar cf - "$ARCHIVE_FROM" | $NICE xz -9 -c - > "$ARCHIVE_FULL_PATH" | |
set +x | |
echo | |
echo "Tar archive was succesfully created, creating the README" | |
echo | |
set -x | |
# save the location to the directory or file that we are archiving | |
pushd "$ARCHIVE_TO_DIR" | |
if [ $ENCRYPT_ARCHIVE -eq 1 ] | |
then | |
[ ! -f "$ARCHIVE_GPG" ] || die "Archive GPG $ARCHIVE_GPG already exists" | |
cat > "$ARCHIVE_TO_DIR"/README.txt << README_ENCRYPT | |
This directory contains files and/or directories that have been archived with | |
tar, with gpg encryption and par2 forward error correction applied. PAR2 is | |
the old Usenet parity 2 format, which provides protection against bit rot. This | |
seems to be a way to portably archive data across Windows, MacOS and Linux, | |
even older versions of Linux. To restore the files, the command line will be: | |
gpg --decrypt $ARCHIVE_GPG | tar xfJ - | |
You can check the archives integrity with: | |
par2 verify $ARCHIVE_PAR2 | |
Original directory archived from : $ARCHIVE_FROM | |
Original archive was created in : $ARCHIVE_TO_DIR | |
README_ENCRYPT | |
set +x | |
echo | |
/bin/ls -lhd "$ARCHIVE_FULL_PATH" | |
echo | |
echo "README was successfully created, creating the GPG file from the tar achive." | |
echo | |
set -x | |
# We do not want to store the password until it is likely | |
# that we have created a good GPG archive. | |
echo "$ARCHIVE_PASSWD" | $NICE gpg --batch --no-tty --yes --passphrase-fd 0 --cipher-algo AES256 --symmetric "$ARCHIVE_NAME" | |
set +x | |
echo | |
echo "GPG file was successfully created, adding the password to the archive list" | |
echo | |
set -x | |
echo "$ARCHIVE_GPG $ARCHIVE_PASSWD" >> "$ARCHIVE_LIST" | |
echo | |
/bin/ls -lh "$ARCHIVE_TO_DIR" | |
echo | |
$NICE par2 create -r100 "$ARCHIVE_GPG" | |
echo | |
/bin/ls -lh "$ARCHIVE_TO_DIR" | |
echo | |
$NICE par2 verify "$ARCHIVE_PAR2" | |
echo | |
echo "$ARCHIVE_PASSWD" | gpg --batch --no-tty --yes --passphrase-fd 0 --decrypt "$ARCHIVE_GPG" | tar tfJ - | |
/bin/rm -f "$ARCHIVE_FULL_PATH" | |
else | |
cat > "$ARCHIVE_TO_DIR"/README.txt << README_NONENCRYPT | |
This directory contains files and/or directories that have been archived with | |
tar and forward error correction. Forward error correction is applied using | |
PAR2. PAR2 is the old Usenet parity 2 format, which provides protection against | |
bit rot. This seems to be a way to portably archive data across Windows, MacOS | |
and Linux, even older versions of Linux. To restore the files, the command line | |
will be: | |
tar xfJ $ARCHIVE_FULL_PATH | |
You can check the archives integrity with: | |
par2 verify $ARCHIVE_PAR2 | |
Original directory archived from : $ARCHIVE_FROM | |
Original archive was created in : $ARCHIVE_TO_DIR | |
README_NONENCRYPT | |
set +x | |
echo | |
echo "README was successfully created, creating the PAR2 files" | |
echo | |
set -x | |
par2 create -r100 "$ARCHIVE_FULL_PATH" | |
par2 verify "$ARCHIVE_PAR2" | |
echo | |
fi | |
set +x | |
echo | |
/bin/ls -lh "$ARCHIVE_TO_DIR" | |
echo | |
echo "Switching back to starting directory" | |
echo | |
popd | |
echo | |
echo Original directory size | |
echo | |
# du is in different places across Linux distributions, rely on the path to | |
# find it. | |
du -sh "$ARCHIVE_FROM" | |
echo | |
echo "Switching to archive directory" | |
echo | |
pushd "$ARCHIVE_TO_DIR" | |
echo | |
echo Archive directory size | |
echo | |
du -sh "$ARCHIVE_TO_DIR" | |
echo | |
echo "Archive --$ARCHIVE_FROM-- was built successfully, we are DONE." | |
echo | |
exit 0 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment