Skip to content

Instantly share code, notes, and snippets.

View netfl0's full-sized avatar

Peter Kaloroumakis netfl0

  • The MITRE Corporation
  • Location Location Location
  • X @netfl0
View GitHub Profile
@netfl0
netfl0 / bow.py
Created May 10, 2017 20:21 — forked from andreasvc/bow.py
Extract Bag-of-Words (BOW) models from a corpus of text files.
"""Extract several BOW models from a corpus of text files.
The models are stored in Matrix Market format which can be read
by gensim. The texts are read from .txt files in the directory
specified as TOPDIR. The output is written to the current directory."""
# NB: All strings are utf8 (not unicode).
import os
import glob
import nltk
import gensim