Skip to content

Instantly share code, notes, and snippets.

@Zirak
Last active December 12, 2015 01:28
Show Gist options
  • Save Zirak/4691348 to your computer and use it in GitHub Desktop.
Save Zirak/4691348 to your computer and use it in GitHub Desktop.
XTRACT ZE FILES!

xzf, XTRACT ZE FILES!, is a utility for file extraction, as you may have guessed. Inspired by xkcd and general boredom. Works on both python v2.7 and v3.3.

Usage:

$ python xzf.py file [extra_tar_flags]

Or, programatically:

from xzf import xtract_ze_files
xzf.extract_ze_files(file, extra_tar_flags)

The extra flags are passed directly to tar, so you can do something like:

$ python xzf.py some_file.tar.gz -v

And the output will be verbose. The command executed in this case will be:

tar --gzip -v -xf some_file.tar.gz
#!/usr/bin/python
#XTRACT ZE FILES!!!
import subprocess
#header => tar flag map
#you can find these with something along the lines of:
# hexdump -n 16 -C file
#(execute in console, of course)
headers = {
#nothing
b'' : '',
#gzip
# http://www.onicos.com/staff/iz/formats/gzip.html
b'\x1f\x8b' : '--gzip',
#bzip2
# http://bzip.org/1.0.5/bzip2-manual-1.0.5.html
# (look for BZ_DATA_ERROR_MAGIC)
b'BZ' : '--bzip2',
#we assume bzip2, even though bzip1 files begin with BZ as well (the
# distinction is in the 3rd byte, h in bzip2, 0 in bzip1)
#TODO: check the following
#xz
# http://tukaani.org/xz/xz-file-format.txt
# (look for Header Magic Bytes)
b'\xfd7zXZ\x00' : '--xz',
}
def xtract_ze_files (fn, extra_flags):
tar_flag = get_tar_flag(fn)
command = [
'tar',
tar_flag,
extra_flags,
'-xf',
fn ]
return subprocess.call(command)
#grabs the header, which is really the leading bytes of a file, depending on
# the longest header we have in stock.
#once you have that, go over our fine list of possible headers, find possible
# matches. I say matches in plural because files may begin with the same bytes,
# but diverge later on.
#now we basically find the best match - that's the longest header in the result,
# naturally indicating the best match, most header chars matched file beginning
def get_tar_flag (fn):
magic = get_header(fn)
#python3's filter returns an iterator instead of a list. we can't handle
# that nicely to work for both versions, so here we just extract the
# filter functionality
possible_heads = [k for k in headers.keys() if magic.startswith(k)]
#default
possible_heads.append('');
return headers[max(possible_heads, key=len)]
def get_header (fn):
longest = max(len(h) for h in headers)
with open(fn, 'rb') as file:
return file.read(longest)
if __name__ == '__main__':
import os, sys
status = xtract_ze_files(
os.path.abspath(sys.argv[1]),
' '.join(sys.argv[2:]))
sys.exit(status)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment