Skip to content

Instantly share code, notes, and snippets.

@utahta
Created December 19, 2010 16:57
Show Gist options
  • Save utahta/747473 to your computer and use it in GitHub Desktop.
Save utahta/747473 to your computer and use it in GitHub Desktop.
huge data book chapter 6. vb encode.
#!/usr/bin/env python
import sys
from struct import pack
from vb import vb_encode
if len(sys.argv) < 2:
print "usage: %s in.txt > out.txt" % sys.argv[0]
sys.exit(1)
fp = open(sys.argv[1], 'r')
for line in fp:
(tag, id_list) = line.rstrip().split('\t')
bytes = []
pre = 0
for id in id_list.split(','):
id = int(id)
bytes.append(vb_encode(id - pre))
pre = id
sys.stdout.write('%s%s%s' % (pack('2i', len(tag), len(bytes)), tag, ''.join(bytes)))
fp.close()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment