Skip to content

Instantly share code, notes, and snippets.

@masayang
Created March 9, 2013 07:28
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save masayang/5123315 to your computer and use it in GitHub Desktop.
Save masayang/5123315 to your computer and use it in GitHub Desktop.
MrJobを使ったMapReduce処理記述と実行
#! /usr/bin/env python
# -*- coding: utf-8 -*-
from mrjob.job import MRJob
class MRWordCounter(MRJob):
def mapper(self, key, line):
for word in line.split():
yield word, 1
def reducer(self, word, occurrences):
yield word, sum(occurrences)
if __name__ == '__main__':
MRWordCounter.run()
#ローカルでの実行
python wc.py < creativecommons.txt
#s3上ファイルを使ってEMR上で実行
export AWS_ACCESS_KEY_ID=<your aws access key>
export AWS_SECRET_ACCESS_KEY=<your secret access key>
python wc.py -r emr s3://masayang-bootcamp/bootcamp4/EMRconsole/creativecommons.txt s3://masayang-bootcamp/bootcamp4/EMRconsole/creativecommons.txt -o s3://masayang-bootcamp/bootcamp4/EMRconsole/<your account>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment