Skip to content

Instantly share code, notes, and snippets.

@MarynaLongnickel
Created June 16, 2018 21:13
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save MarynaLongnickel/189fcf095f5fbbae11ea8eab562e655b to your computer and use it in GitHub Desktop.
Save MarynaLongnickel/189fcf095f5fbbae11ea8eab562e655b to your computer and use it in GitHub Desktop.
from operator import itemgetter
from collections import Counter
flat_list = [i for sublist in filtered_tokens for i in sublist]
# Count how many times each word appears
count = Counter(flat_list).items()
sorted_count = sorted(count, key = itemgetter(1))
sorted_count.reverse()
# Select 5000 most frequent words
top5000 = [i[0] for i in sorted_count[:5000]]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment