Skip to content

Instantly share code, notes, and snippets.

@danharvey
Created November 14, 2013 17:17
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save danharvey/629b5b652b77359ad895 to your computer and use it in GitHub Desktop.
Save danharvey/629b5b652b77359ad895 to your computer and use it in GitHub Desktop.
Given offsets of GZip headers, this splits on those offsets.
import sys
input_file = sys.argv[1]
offset_file = sys.argv[2]
output_file = input_file.replace('.gz','')
offsets = map(long, open(offset_file).read().rstrip().split("\n"))
with open(input_file,"rb") as f:
part = 1
previous = 0
for offset in offsets:
print "Index: " + str(offset)
if offset == 0:
continue
file_size = offset - previous
byte_data = f.read(file_size)
print "Offset: " + str(f.tell())
if byte_data[0:3] == '\x1F\x8B\x08':
print "valid"
with open(output_file+'-part-'+str(part)+'-of-'+str(len(offsets))+'.gz', 'wb') as o:
o.write(byte_data)
previous = offset
part += 1
print ""
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment