Skip to content

Instantly share code, notes, and snippets.

@vsoch
Created December 12, 2018 23:52
Show Gist options
  • Save vsoch/039a467d072b14401fab2bc4986ef590 to your computer and use it in GitHub Desktop.
Save vsoch/039a467d072b14401fab2bc4986ef590 to your computer and use it in GitHub Desktop.
A quick example of reading gzip archives into memory FROM a tar archive (mind blown!)
import sys
import tarfile
input_tar = sys.argv[1]
# If input tar is not found, do not proceed
if not os.path.exists(input_tar):
print('Cannot find %s!' % input_tar)
sys.exit(1)
tar = tarfile.open(input_tar, 'r')
for member in tar:
# Are we dealing with a file?
if member.isfile():
# Is it a gzip archive?
if member.name.endswith('.tar.gz'):
print("Wouhou! Extracting %s" % member.name)
subtar = tarfile.open(mode='r|gz', fileobj=tar.extractfile(member))
# Now we can find papers (.tex LaTex files) inside
for submember in subtar:
if submember.name.endswith('.tex'):
print("Found LaTeX file %s!" % submember.name)
# We extract the submember from it's parent subtar
with subtar.extractfile(submember) as m:
tex = m.read()
# Do something with your tex!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment