Skip to content

Instantly share code, notes, and snippets.

@smalyshev
Created February 2, 2014 05:19
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save smalyshev/8763429 to your computer and use it in GitHub Desktop.
Save smalyshev/8763429 to your computer and use it in GitHub Desktop.
import sys
f = open(sys.argv[1], 'r')
txt = f.readline().strip()
patterns = [p.strip() for p in f]
f = open(sys.argv[2], 'r')
answers = [int(i) for i in f.readline().strip().split(' ')]
anset = set()
for ans in answers:
good = False
for p in patterns:
if txt[ans:ans+len(p)] == p:
good = True
if not good:
print "=== BAD: ", ans
if ans in anset:
print "=== DUPLICATE: ", ans
anset.add(ans)
# print ans
for p in patterns:
# print p
idx = 0
while True:
idx = txt.find(p, idx)
if idx == -1:
break
if idx not in anset:
print "=== MISSING: ", p, idx
idx += 1
@smalyshev
Copy link
Author

Use: check.py dataset.txt answers.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment