Skip to content

Instantly share code, notes, and snippets.

@charleyramm
Created December 21, 2015 06:12
Show Gist options
  • Save charleyramm/62afb98cc389edb8fe5b to your computer and use it in GitHub Desktop.
Save charleyramm/62afb98cc389edb8fe5b to your computer and use it in GitHub Desktop.
1 #!/usr/bin/env python
2 import re, sys, collections
3
# open the file and split on commas.
4 stops = open(’../stop_words.txt’).read().split(’,’)
# what is the re.findall()? what does {2,} mean?
# read file specified by command line arguments, convert to lower case, findall a-zs
5 words = re.findall(’[a-z]{2,}’, open(sys.argv[1]).read().lower())
# what is collections.Counter()?
# increment counts[w] for w in words, if it isn't in Stop Words.
6 counts = collections.Counter(w for w in words if w not in stops)
# Print word and count, for top 25 words.
7 for (w, c) in counts.most_common(25):
8 print w, ’-’, c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment