Skip to content

Instantly share code, notes, and snippets.

@mmas
Created July 5, 2017 00:02
Show Gist options
  • Save mmas/9f11dde0cceb352f97bba54feb2bd28c to your computer and use it in GitHub Desktop.
Save mmas/9f11dde0cceb352f97bba54feb2bd28c to your computer and use it in GitHub Desktop.
#!/usr/bin/env python
import re
import sys
started = False
for line in sys.stdin:
if started:
if line.startswith('*** END OF THIS PROJECT'):
break
# Filter out some punctuation marks and set to lowercase.
line = re.sub(r'["?!.,;:()-]', '', line).strip().lower()
for word in line.split():
print '%s\t1' % word
elif line.startswith('*** START OF THIS PROJECT'):
started = True
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment