Skip to content

Instantly share code, notes, and snippets.

@vasinkd
Created March 13, 2019 10:10
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save vasinkd/411b189f43f3cc8979ec0f6dfe8efb11 to your computer and use it in GitHub Desktop.
Save vasinkd/411b189f43f3cc8979ec0f6dfe8efb11 to your computer and use it in GitHub Desktop.
Updateable CountVectorizer
from sklearn.feature_extraction.text import CountVectorizer
import six
class UpdateableCountVectorizer(CountVectorizer):
def update(self, text, stop_words=[]):
require_sort = False
for word in text.split():
if (word not in self.vocabulary_) and \
(word not in stop_words):
self.vocabulary_[word] = word
require_sort = True
if require_sort:
sorted_features = sorted(six.iteritems(self.vocabulary_))
for new_val, (term, old_val) in enumerate(sorted_features):
self.vocabulary_[term] = new_val
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment