Skip to content

Instantly share code, notes, and snippets.

@soren
Created November 22, 2013 07:42
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save soren/7596285 to your computer and use it in GitHub Desktop.
Save soren/7596285 to your computer and use it in GitHub Desktop.
A Perl Word Count reducer script. Can be used as a reducer in Hadoop using the Streaming interface. Tested with Java 1.6 and Hadoop 1.0.4.
#!/usr/bin/env perl
use warnings;
use strict;
my $current_word = "";
my $current_count = 0;
while (<>) {
chomp;
my ($word, $count) = split /\t/;
if ($word eq $current_word) {
$current_count += $count;
} else {
print "$current_word\t$current_count\n" if length($current_word);
$current_word = $word;
$current_count = $count;
}
}
print "$current_word\t$current_count\n";
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment