Created

Embed URL

HTTPS clone URL

SSH clone URL

You can clone with HTTPS or SSH.

Download Gist

How to do Top N in Hadoop

View top_n.pig
1 2 3 4 5 6 7 8 9 10 11 12
/* Data Fu */
REGISTER /me/Software/datafu/dist/datafu-0.0.9-SNAPSHOT.jar
REGISTER /me/Software/datafu/lib/*.jar /* */
 
DEFINE Quantile datafu.pig.stats.Quantile('0.9', '0.95');
 
quantiles = foreach (group thing_counts all) {
sorted = order thing_counts by total;
generate FLATTEN(Quantile(sorted.total)) as (nine, nine_five);
};
dump quantiles
(1241.0,2121515.0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.