Skip to content

Instantly share code, notes, and snippets.

@kgadek
Created March 21, 2013 13:42
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kgadek/5213092 to your computer and use it in GitHub Desktop.
Save kgadek/5213092 to your computer and use it in GitHub Desktop.
Python programme to find out 20 most common words in text
#!/usr/bin/env python3.2
import re
from operator import itemgetter
word2num = {}
num2word = {}
with open('potop.txt', mode='r', encoding='utf-8') as potop:
for line in potop:
line = re.sub("[],.\\-\\?:;\"()/'=$[`!]", ' ', line)
for word in line.split():
word2num[word] = 1 + word2num.get(word, 0)
s = sorted(word2num.items(), key=itemgetter(1))
s.reverse()
for w,c in s[:20]:
print("{0} --> {1}".format(c,w))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment