Skip to content

@rjurney /top_n.pig
Created

Embed URL

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
How to do Top N in Hadoop
/* Data Fu */
REGISTER /me/Software/datafu/dist/datafu-0.0.9-SNAPSHOT.jar
REGISTER /me/Software/datafu/lib/*.jar /* */
DEFINE Quantile datafu.pig.stats.Quantile('0.9', '0.95');
quantiles = foreach (group thing_counts all) {
sorted = order thing_counts by total;
generate FLATTEN(Quantile(sorted.total)) as (nine, nine_five);
};
dump quantiles
(1241.0,2121515.0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.