This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
"""Extract several BOW models from a corpus of text files. | |
The models are stored in Matrix Market format which can be read | |
by gensim. The texts are read from .txt files in the directory | |
specified as TOPDIR. The output is written to the current directory.""" | |
# NB: All strings are utf8 (not unicode). | |
import os | |
import glob | |
import nltk | |
import gensim |