Skip to content

Instantly share code, notes, and snippets.

@cotdp
Created July 6, 2012 21:38
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save cotdp/3062892 to your computer and use it in GitHub Desktop.
Save cotdp/3062892 to your computer and use it in GitHub Desktop.
Example ZipFile Job
// Standard stuff
Job job = new Job(conf);
job.setJobName(this.getClass().getSimpleName());
job.setJarByClass(this.getClass());
job.setMapperClass(MyMapper.class);
job.setReducerClass(MyReducer.class);
// Hello there ZipFileInputFormat!
job.setInputFormatClass(ZipFileInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
// The output files will contain "Word [TAB] Count"
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
// We want to be fault-tolerant
ZipFileInputFormat.setLenient( true );
ZipFileInputFormat.setInputPaths(job, new Path("/data/archives/*.zip"));
TextOutputFormat.setOutputPath(job, new Path("/tmp/zip_wordcount"));
//
job.waitForCompletion(true);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment