Created
January 26, 2013 05:33
-
-
Save matpalm/4640392 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
import json | |
import urllib | |
def estimated_count_for(search_term): | |
url = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&%s' % urllib.urlencode({'q': search_term}) | |
results = json.loads(urllib.urlopen(url).read()) | |
try: | |
return results['responseData']['cursor']['estimatedResultCount'] | |
except KeyError: | |
return 0 | |
tweet = "Is it just me or has the Haskell just recently figured out that statistical conjugacy can be interpreted algebraically" | |
tokens = tweet.split(' ') | |
for i in range(len(tokens) - 1): | |
bigram = " ".join(tokens[i:i+2]) | |
print "%s\t%s" % (bigram, estimated_count_for(bigram)) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ bigrams.py | sort -t' ' -k2 -n result.tsv | head -n5 | |
Haskell just 1070000 | |
statistical conjugacy 1420000 | |
interpreted algebraically 1430000 | |
the Haskell 1820000 | |
conjugacy can 1970000 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment