Skip to content

Instantly share code, notes, and snippets.

@rjurney
Created February 2, 2013 07:29
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rjurney/4696443 to your computer and use it in GitHub Desktop.
Save rjurney/4696443 to your computer and use it in GitHub Desktop.
How to do Top N in Hadoop
/* Data Fu */
REGISTER /me/Software/datafu/dist/datafu-0.0.9-SNAPSHOT.jar
REGISTER /me/Software/datafu/lib/*.jar /* */
DEFINE Quantile datafu.pig.stats.Quantile('0.9', '0.95');
quantiles = foreach (group thing_counts all) {
sorted = order thing_counts by total;
generate FLATTEN(Quantile(sorted.total)) as (nine, nine_five);
};
dump quantiles
(1241.0,2121515.0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment