Skip to content

Instantly share code, notes, and snippets.

@gupul2k
Created October 16, 2012 20:43
Show Gist options
  • Save gupul2k/3901868 to your computer and use it in GitHub Desktop.
Save gupul2k/3901868 to your computer and use it in GitHub Desktop.
NLP: Count frequent words in a file
#Author: Sobhan Hota
#Finds most frequent 500 words in a given file
from string import punctuation
from operator import itemgetter
N = 500
words = {}
words_gen = (word.strip(punctuation).lower() for line in open("C:\Python27\Corpus.txt")
for word in line.split())
for word in words_gen:
words[word] = words.get(word, 0) + 1
top_words = sorted(words.iteritems(), key=itemgetter(1), reverse=True)[:N]
for word, frequency in top_words:
print "%s %d" % (word, frequency)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment