Skip to content

Instantly share code, notes, and snippets.

@yaronv
Last active September 5, 2018 06:32
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save yaronv/e39156d4effb25709be98bb0cff1df7a to your computer and use it in GitHub Desktop.
Save yaronv/e39156d4effb25709be98bb0cff1df7a to your computer and use it in GitHub Desktop.
stream documents one by one from the disc
class MyCorpus(object):
def __iter__(self):
for line in open('mycorpus.txt'):
# assume there's one document per line, tokens separated by whitespace
yield dictionary.doc2bow(line.lower().split())
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment