Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
# Using CountVectorizer to get the most important trigrams
co = CountVectorizer(ngram_range=(3,3))
counts = co.fit_transform(comments)
important_trigrams = pd.DataFrame(counts.sum(axis=0),columns=co.get_feature_names()).T.sort_values(0,ascending=False).head(50)
# Next, we reset the index, rename the columns and apply the translate module to get the english translations
important_trigrams=important_trigrams.reset_index()
important_trigrams.rename(columns={'index':'trigrams',0:'frequency'},inplace=True)
important_trigrams['english_translation'] = important_trigrams['trigrams'].apply(translator.translate)
important_trigrams
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment