Skip to content

Instantly share code, notes, and snippets.

@lgmkr
Last active November 3, 2016 19:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save lgmkr/e55aede8904961866e48372576051dc4 to your computer and use it in GitHub Desktop.
Save lgmkr/e55aede8904961866e48372576051dc4 to your computer and use it in GitHub Desktop.
How to analyse text and count word frequencies in python
import re
import string
frequency = {}
document_text = open('test.txt', 'r')
text_string = document_text.read().lower()
match_pattern = re.findall(r'\b[a-z]{3,15}\b', text_string)
for word in match_pattern:
count = frequency.get(word,0)
frequency[word] = count + 1
frequency_list = frequency.keys()
results = {}
for words in frequency_list:
results[words] = frequency[words]
# print results.items()
print sorted(results.items(), key=lambda x: x[1])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment