Skip to content

Instantly share code, notes, and snippets.

@elubow
Created February 21, 2011 02:20
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save elubow/836585 to your computer and use it in GitHub Desktop.
Save elubow/836585 to your computer and use it in GitHub Desktop.
-- Log line
-- {"exchange_id":"4cc877b81badf422af000010","exchange_user_id":"MTY4Mjk2NTk2eDAuODA2IDEyOTc4MDI5NTh4MTI2NDc5NjY2MA","bid_id":"00cc4341-facb-4ec1-a403-d5309472d70e","bid_amount":"2.05","win_amount":1.369999968133322,"ad_ids":"4d237a731badf45c8200011a,4d237ac81badf45c85000006,4d4c64c0e32b132113000013,4d23807a1badf45c85000299","wv":"2","logged_at":"2011-02-15T23:36:31.386Z"}
REGISTER file:/home/hadoop/lib/pig/piggybank.jar;
DEFINE EXTRACT org.apache.pig.piggybank.evaluation.string.EXTRACT();
RAW_LOGS = LOAD 'file:/home/hadoop/logs/adserver.log' USING TextLoader AS (line:chararray);
LOGS_BASE= foreach RAW_LOGS generate FLATTEN(EXTRACT(line,'{"exchange_id":"(.*[^"])","exchange_user_id":"(.*[^"])","bid_id":"(.*[^"])","bid_amount":"(.*[^"])","win_amount":(.*),"ad_ids":"(.*[^"])","wv":"(.*[^"])","logged_at":"(.*[^"])"}')) AS (exchange_id:chararray,exchange_user_id:chararray,bid_id:chararray,bid_amount:float,win_amount:float,ad_ids:chararray,wv:int,logged_at:chararray);
WIDGET_VERSION_ONLY = FOREACH LOGS_BASE GENERATE wv;
WIDGET_VERSION_COUNT = FOREACH (GROUP WIDGET_VERSION_ONLY BY $0) GENERATE $0, COUNT($1) as num;
WIDGET_VERSION_SORTED_COUNT = LIMIT(ORDER WIDGET_VERSION_COUNT BY num DESC) 5;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment