Skip to content

Instantly share code, notes, and snippets.

@cotdp cotdp/gist:3062901

Created Jul 6, 2012
What would you like to do?
Mapper for processing ZipFile entries
* This Mapper class checks the filename ends with the .txt extension, cleans
* the text and then applies the simple WordCount algorithm.
public static class MyMapper
extends Mapper<Text, BytesWritable, Text, IntWritable>
private final static IntWritable one = new IntWritable( 1 );
private Text word = new Text();
public void map( Text key, BytesWritable value, Context context )
throws IOException, InterruptedException
// NOTE: the filename is the *full* path within the ZIP file
// e.g. "subdir1/subsubdir2/Ulysses-18.txt"
String filename = key.toString(); "map: " + filename );
// We only want to process .txt files
if ( filename.endsWith(".txt") == false )
// Prepare the content
String content = new String( value.getBytes(), "UTF-8" );
content = content.replaceAll( "[^A-Za-z \n]", "" ).toLowerCase();
// Tokenize the content
StringTokenizer tokenizer = new StringTokenizer( content );
while ( tokenizer.hasMoreTokens() )
word.set( tokenizer.nextToken() );
context.write( word, one );
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.