Skip to content

Instantly share code, notes, and snippets.

@dpwrussell
Created October 7, 2013 08:17
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dpwrussell/6864305 to your computer and use it in GitHub Desktop.
Save dpwrussell/6864305 to your computer and use it in GitHub Desktop.
Fast lookup index which writes lines to a file if they have a matching id in the index. For Tara.
test = set([])
keep = 'keep.txt'
all = 'input.txt'
out = 'output.txt'
outputfile = open(out, 'w')
with open(keep, 'rb') as file:
for line in file:
s = line.split('\t', 3)
test.add(s[0] + '-' +s[2])
test.add(s[1])
print 'index complete, now reading in input and writing matches to output'
i = 0
with open(all, 'rb') as file:
for line in file:
id = line.split(',', 1)[0]
if id in test:
outputfile.write(line)
if i%100000 ==1:
print i
i += 1
outputfile.close()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment