Skip to content

Instantly share code, notes, and snippets.

@tf198
Created January 24, 2013 09:49
Show Gist options
  • Save tf198/4619334 to your computer and use it in GitHub Desktop.
Save tf198/4619334 to your computer and use it in GitHub Desktop.
Archiver for directories with years of accumulated files you want to clean up. I had a directory with 15 years of projects, many with lots of revisions and helpfully named 'important.old' folders totaling about 20GB and I wanted to start backing it up to offsite. This script pulls out directories, tars and compresses them in a sensible way to a …
#!/usr/bin/env python
'''
This script will archive a directory and leave a text file
in its place with info about the files that have been removed. Allows you to
be fairly ruthless in your cleanup safe in the knowledge you can get it back later
if required. For example:
python archive.py data/projects/old_project -a data/archive
will result in two files, the old_projects directory will be deleted
data/archive/data_projects_old_project.20130124.tar.bz
BZIPed data
data/projects/old_project.archive
Archived to data/archive/data_projects_old_project.20130124.tar.bz
old_project/
old_project/file1
...
'''
import sys, re, subprocess, datetime, os, argparse
# use argparse 'cause it makes it so easy
parser = argparse.ArgumentParser(description='Archive directories')
parser.add_argument('target', help='Directory or file to archive')
parser.add_argument('-a', '--archive', default='~/archive', help='Directory to store archives in')
parser.add_argument('-n', action='store_true', dest='noact', help='Just output path information')
parser.add_argument('-k', action='store_true', dest='keep', help='Keep target after archiving')
options = parser.parse_args()
# check our paths
if not os.path.isdir(options.archive) or not os.access(options.archive, os.W_OK | os.X_OK):
print "Unable to write to %s" % options.archive
quit(1)
target = options.target.rstrip('/')
if not os.access(target, os.R_OK):
print "Unable to read from %s" % target
quit(1)
# create verbose with bzip2 compression
opts = '-cvpj'
# construct an archive filename <archive>/<safe_target_name>.<date>.tar.bz
safe = re.sub(r'[^a-z0-9\._-]', '', target.strip('/').lower().replace('/', '_').replace(' ', '-'))
archive = "%s/%s.%s.tar.bz" % (
options.archive.rstrip('/'),
safe,
datetime.datetime.now().strftime('%Y%m%d%H%M')
)
# split the target so we dont store the path
(working, d) = os.path.split(target)
working = working or '.'
print "\nTarget: %s" % target
print "Archive: %s\n" % archive
cmd = ["tar", opts, '-f', archive, '-C', working, d]
if options.noact:
print "Command: %s" % ' '.join(cmd)
print "Taking no action"
quit()
# run the archive
files = subprocess.check_output(cmd)
print "Writing info to '%s.archive'" % target
with open("%s.archive" % target, 'w') as f:
f.write("Archived to %s\n\n" % archive)
f.write(files)
# delete the target if required
if not options.keep:
print "Deleting target"
subprocess.call(["rm", "-rf", target])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment