Skip to content

Instantly share code, notes, and snippets.

@graysuit
Forked from gupul2k/mf_500_Bag_of_Words.py
Created February 26, 2022 19:13
Show Gist options
  • Save graysuit/039ef6ec41b30b4d524b555f60cf8349 to your computer and use it in GitHub Desktop.
Save graysuit/039ef6ec41b30b4d524b555f60cf8349 to your computer and use it in GitHub Desktop.
NLP: Count frequent words in a file
#Author: Sobhan Hota
#Finds most frequent 500 words in a given file
from string import punctuation
from operator import itemgetter
N = 500
words = {}
words_gen = (word.strip(punctuation).lower() for line in open("C:\Python27\Corpus.txt")
for word in line.split())
for word in words_gen:
words[word] = words.get(word, 0) + 1
top_words = sorted(words.iteritems(), key=itemgetter(1), reverse=True)[:N]
for word, frequency in top_words:
print "%s %d" % (word, frequency)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment