Create a gist now

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Example ZipFile Job
// Standard stuff
Job job = new Job(conf);
job.setJobName(this.getClass().getSimpleName());
job.setJarByClass(this.getClass());
job.setMapperClass(MyMapper.class);
job.setReducerClass(MyReducer.class);
// Hello there ZipFileInputFormat!
job.setInputFormatClass(ZipFileInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
// The output files will contain "Word [TAB] Count"
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
// We want to be fault-tolerant
ZipFileInputFormat.setLenient( true );
ZipFileInputFormat.setInputPaths(job, new Path("/data/archives/*.zip"));
TextOutputFormat.setOutputPath(job, new Path("/tmp/zip_wordcount"));
//
job.waitForCompletion(true);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment