Skip to content

Instantly share code, notes, and snippets.

@russelnickson
Created January 16, 2010 20:30
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save russelnickson/278990 to your computer and use it in GitHub Desktop.
Save russelnickson/278990 to your computer and use it in GitHub Desktop.
most used words in a file..
import string,sys
f1=open(sys.argv[1])
f2=open(sys.argv[2])
content= string.lower(f1.read())
noisecontent= string.lower(f2.read())
workinglist =string.split(content)
cleanlist =[]
for item in workinglist:
temp=item.strip(string.punctuation)
cleanlist=cleanlist+[temp,]
freq = {}
for item in cleanlist:
if item in noisecontent:
continue
else:
try:
freq[item] += 1
except KeyError:
freq[item] = 1
print '\nWORD'+' \t '+'FREQUENCY'
def most_common(h):
t = []
for key, value in h.items():
t.append((value, key))
t.sort(reverse=True)
return t
t = most_common(freq)
for freque, word in t[0:10]:
print word,' \t', freque
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment