Skip to content

Instantly share code, notes, and snippets.

@milindjagre
Created April 11, 2016 10:49
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save milindjagre/fdd5bca745a8613b183ac44dcd7097d5 to your computer and use it in GitHub Desktop.
Save milindjagre/fdd5bca745a8613b183ac44dcd7097d5 to your computer and use it in GitHub Desktop.
This is Mapper Class which is used while reading Microsoft Word Document file using MapReduce API
/*
* To change this license header, choose License Headers in Project Properties.
* To change this template file, choose Tools | Templates
* and open the template in the editor.
*/
package com.milind.mr.worddoc;
/**
*
* @author milind
*/
import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class WordMapper extends
Mapper<LongWritable, Text, Text, Text> {
private static Logger LOG = LoggerFactory.getLogger(WordMapper.class);
/**
* Excel Spreadsheet is supplied in string form to the mapper. We are simply
* emitting them for viewing on HDFS.
*/
public void map(LongWritable key, Text value, Context context)
throws InterruptedException, IOException {
String line = value.toString();
context.write(new Text(line), null);
LOG.info("Map processing finished");
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment