Skip to content

Instantly share code, notes, and snippets.

@soren
Created November 22, 2013 07:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save soren/7596270 to your computer and use it in GitHub Desktop.
Save soren/7596270 to your computer and use it in GitHub Desktop.
A Perl Word Count mapper script. Can be used as a mapper in Hadoop using the Streaming interface. Tested with Java 1.6 and Hadoop 1.0.4.
#!/usr/bin/env perl
use warnings;
use strict;
while (<>) {
chomp;
print lc $_,"\t1\n" foreach split /[\s.,:;!?]+/;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment