Skip to content

Instantly share code, notes, and snippets.

@tos-kamiya
Created August 6, 2013 08:57
Show Gist options
  • Save tos-kamiya/6162869 to your computer and use it in GitHub Desktop.
Save tos-kamiya/6162869 to your computer and use it in GitHub Desktop.
study of mrjob
import re
from mrjob.job import MRJob
WORD_RE = re.compile(r"[\w']+")
class WordCollocationCount(MRJob):
def mapper(self, _, line):
t = [word.lower() for word in WORD_RE.findall(line)]
for w1, w2 in zip(t, t[1:]):
yield (w1, w2), 1
def combiner(self, word_pair, counts):
yield word_pair, sum(counts)
def reducer(self, word_pair, counts):
yield word_pair, sum(counts)
if __name__ == '__main__':
WordCollocationCount.run()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment