Skip to content

Instantly share code, notes, and snippets.

@prateekjoshi565
Last active April 21, 2019 12:15
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save prateekjoshi565/b7ff8a0c721a1d5f70fbd2f48d7718c4 to your computer and use it in GitHub Desktop.
Save prateekjoshi565/b7ff8a0c721a1d5f70fbd2f48d7718c4 to your computer and use it in GitHub Desktop.
genre_words_visual
def freq_words(x, terms = 30):
all_words = ' '.join([text for text in x])
all_words = all_words.split()
fdist = nltk.FreqDist(all_words)
words_df = pd.DataFrame({'word':list(fdist.keys()), 'count':list(fdist.values())})
# selecting top 20 most frequent words
d = words_df.nlargest(columns="count", n = terms)
# visualize words and frequencies
plt.figure(figsize=(12,15))
ax = sns.barplot(data=d, x= "count", y = "word")
ax.set(ylabel = 'Word')
plt.show()
# print 100 most frequent words
freq_words(movies_new['clean_plot'], 100)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment