Skip to content

Instantly share code, notes, and snippets.

@Johnne32
Created November 30, 2020 07:34
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Johnne32/8775f1bc5ab18b1c40781da23dc93440 to your computer and use it in GitHub Desktop.
Save Johnne32/8775f1bc5ab18b1c40781da23dc93440 to your computer and use it in GitHub Desktop.
# Using CountVectorizer to get the most important trigrams
co = CountVectorizer(ngram_range=(3,3))
counts = co.fit_transform(comments)
important_trigrams = pd.DataFrame(counts.sum(axis=0),columns=co.get_feature_names()).T.sort_values(0,ascending=False).head(50)
# Next, we reset the index, rename the columns and apply the translate module to get the english translations
important_trigrams=important_trigrams.reset_index()
important_trigrams.rename(columns={'index':'trigrams',0:'frequency'},inplace=True)
important_trigrams['english_translation'] = important_trigrams['trigrams'].apply(translator.translate)
important_trigrams
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment