Skip to content

Instantly share code, notes, and snippets.

@russelnickson
Created January 14, 2010 21:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save russelnickson/277549 to your computer and use it in GitHub Desktop.
Save russelnickson/277549 to your computer and use it in GitHub Desktop.
Count Word Frequency2 (Eliminate Noise Words)
import string,sys
f1=open(sys.argv[1])
f2=open(sys.argv[2])
content= string.lower(f1.read())
noisecontent= string.lower(f2.read())
workinglist =string.split(content)
cleanlist =[]
for item in workinglist:
temp=item.strip(string.punctuation)
cleanlist=cleanlist+[temp,]
freq = {}
for item in cleanlist:
if item in noisecontent:
continue
else:
try:
freq[item] += 1
except KeyError:
freq[item] = 1
print '\nWORD'+' \t '+'FREQUENCY'
for items in freq:
print items+' \t'+str(freq[items])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment