public
Created

How to do Top N in Hadoop

  • Download Gist
top_n.pig
1 2 3 4 5 6 7 8 9 10 11 12
/* Data Fu */
REGISTER /me/Software/datafu/dist/datafu-0.0.9-SNAPSHOT.jar
REGISTER /me/Software/datafu/lib/*.jar /* */
 
DEFINE Quantile datafu.pig.stats.Quantile('0.9', '0.95');
 
quantiles = foreach (group thing_counts all) {
sorted = order thing_counts by total;
generate FLATTEN(Quantile(sorted.total)) as (nine, nine_five);
};
dump quantiles
(1241.0,2121515.0)

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.