Skip to content

Instantly share code, notes, and snippets.

@jedisct1
Created April 5, 2011 12:25
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jedisct1/903501 to your computer and use it in GitHub Desktop.
Save jedisct1/903501 to your computer and use it in GitHub Desktop.
Just a personal cheatsheet on pig+json
git clone https://github.com/kevinweil/elephant-bird.git
ant nonothing
cp lib/google-collect*jar lib/json-simple*jar /tmp/x
cd build/classes
jar -cf /tmp/x/elephant-bird.jar com
REGISTER google-collect-1.0.jar;
REGISTER json-simple-1.1.jar;
REGISTER elephant-bird.jar;
json = LOAD '/tmp/a' USING com.twitter.elephantbird.pig.load.JsonLoader();
user_name = FOREACH json GENERATE (CHARARRAY) $0#'user_name' AS user_name;
count = FOREACH (GROUP user_name BY $0) GENERATE $0 AS user_name, COUNT($1) as cnt;
sorted = LIMIT(ORDER count BY cnt DESC) 50;
STORE sorted into '/tmp/spot.out';
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment