Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save Tathagatd96/5956acaae32f7a6046a957b127ebbd15 to your computer and use it in GitHub Desktop.
Save Tathagatd96/5956acaae32f7a6046a957b127ebbd15 to your computer and use it in GitHub Desktop.
print len(twenty_train.data)
print("\n".join(twenty_train.data[0].split("\n")[:3]))
print(twenty_train.target_names[twenty_train.target[0]])
print(twenty_train.target[:10])
for t in twenty_train.target[:10]:
print(twenty_train.target_names[t])
#Preprocessing
#Tokenizing text
count_vect=CountVectorizer()
X_train_counts=count_vect.fit_transform(twenty_train.data)
print(X_train_counts.shape)
print(count_vect.vocabulary_.get(u'algorithm'))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment