Skip to content

Instantly share code, notes, and snippets.

@criccomini
Created September 23, 2012 22:58
Show Gist options
  • Save criccomini/3773351 to your computer and use it in GitHub Desktop.
Save criccomini/3773351 to your computer and use it in GitHub Desktop.
Hadoop, Pig, and SQL (GROUP BY)
SELECT COUNT(*) FROM mytable;
mytable = GROUP mytable ALL;
mytable = FOREACH mytable GENERATE COUNT(mytable);
DUMP mytable;
SELECT COUNT(DISTINCT col1) FROM mytable;
mytable = FOREACH mytable GENERATE col1;
mytable = DISTINCT col1;
mytable = GROUP mytable BY col1;
mytable = FOREACH mytable GENERATE group AS col1, COUNT(mytable) AS cnt;
DUMP mytable;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment