Create a gist now

Instantly share code, notes, and snippets.

What would you like to do?
How to do Top N in Hadoop
/* Data Fu */
REGISTER /me/Software/datafu/dist/datafu-0.0.9-SNAPSHOT.jar
REGISTER /me/Software/datafu/lib/*.jar /* */
DEFINE Quantile datafu.pig.stats.Quantile('0.9', '0.95');
quantiles = foreach (group thing_counts all) {
sorted = order thing_counts by total;
generate FLATTEN(Quantile( as (nine, nine_five);
dump quantiles
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment