Skip to content

Instantly share code, notes, and snippets.

@andreaschandra
Created May 14, 2018 07:26
Show Gist options
  • Save andreaschandra/a33773998d92139e5d716a28dfa343cd to your computer and use it in GitHub Desktop.
Save andreaschandra/a33773998d92139e5d716a28dfa343cd to your computer and use it in GitHub Desktop.
print("Extracting features from the training dataset using a sparse vectorizer")
t0 = time()
vectorizer = TfidfVectorizer(max_df = 0.5, max_features = 10000,
min_df = 2, stop_words = 'english',
use_idf = True)
X = vectorizer.fit_transform(dataset.data)
print("done in %fs" % (time() - t0))
print("n_samples: %d, n_features: %d" % X.shape)
print()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment