Skip to content

Instantly share code, notes, and snippets.

Created August 26, 2012 06:41
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save anonymous/3475182 to your computer and use it in GitHub Desktop.
Save anonymous/3475182 to your computer and use it in GitHub Desktop.
Data Intensive Text Processing with MapReduce #3 figure3.1 Mapper
package info.moaikids.mapred.map;
import info.moaikids.chunker.Chunker;
import info.moaikids.chunker.KuromojiChunker;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class Figure31Mapper extends
Mapper<LongWritable, Text, Text, IntWritable> {
static final IntWritable ONE = new IntWritable(1);
Chunker chunker = new KuromojiChunker();
@Override
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
for (String chunk : chunker.chunking(value.toString())) {
if (chunk.trim().isEmpty()) {
continue;
}
context.write(new Text(chunk), ONE);
}
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment