Skip to content

Instantly share code, notes, and snippets.

@mhermans
Created April 6, 2012 12:00
Show Gist options
  • Save mhermans/2319191 to your computer and use it in GitHub Desktop.
Save mhermans/2319191 to your computer and use it in GitHub Desktop.
Python snippet for generating word frequencies for Wordle (using NLTK)
#! /usr/bin/env python
# wordcount.py: parse & return word frequency
import sys, nltk
f = open(sys.argv[1], 'rU')
txt = f.read()
f.close()
tokens = nltk.word_tokenize(txt) # tokenize text
clean_tokens = []
for word in tokens:
word = word.lower()
if word.isalpha(): # drop all non-words
clean_tokens.append(word)
# make frequency distribution of words
fd = nltk.FreqDist(clean_tokens)
for token in fd:
print token, ':', fd[token]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment