Skip to content

Instantly share code, notes, and snippets.

Sureshs-MacBook-Pro:hadoop sureshsaggar$ pig -x local
grunt> REGISTER /Users/sureshsaggar/Documents/GITHubProjects/hadoop/SSPigLoader.jar;
grunt> REGISTER /Users/sureshsaggar/Documents/GITHubProjects/hadoop/jar/json-simple-1.1.jar;
grunt> data = LOAD '/Users/sureshsaggar/Documents/GITHubProjects/hadoop/logs/analytics.log' USING com.ss.analytics.pig.SSPigLoader('date type to') AS (date: chararray, type :chararray, to: chararray);
grunt> dump data
....
....
....
2012-08-13 21:35:32,203 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
****file:/tmp/temp-265145508/tmp1111831199
@sureshsaggar
sureshsaggar / analytics.log
Created August 13, 2012 15:44
An example input log file for PIG queries.
2012-08-08 07:50UTC|m|{"to":"+919912345678","distinct_id":"UA--z9PpUBzCAAAA","ts":1344412211,"ih":"1","fr":"+919912345678","ml":6}
2012-08-08 07:50UTC|i|{"to":"+919912345678","distinct_id":"UA--z9PpUBzCAAAA","ts":1344412213}
2012-08-08 07:50UTC|b|{"to":"+919912345678","ts":1344412225,"ih":"1","fr":"+919912345678","tags":"PIG, Hadoop"}
2012-08-08 07:50UTC|b|{"to":"+919912345678","ts":1344412225,"ih":"1","fr":"+919912345678","tags":"Apache, Custom Loader"}
2012-08-08 07:51UTC|i|{"to":"+919912345678","distinct_id":"UA--z9PpUBzCAAAA","ts":1344412271}
2012-08-08 07:51UTC|m|{"to":"+919912345678","distinct_id":"UA--z9PpUBzCAAAA","ts":1344412272,"ih":"1","fr":"+919912345678","userstate":"1","ml":2}
2012-08-08 07:51UTC|m|{"to":"+919912345678","distinct_id":"UA--z9PpUBzCAAAA","ts":1344412283,"ih":"1","fr":"+919912345678", "ml":3}
2012-08-08 07:51UTC|m|{"to":"+919912345678","distinct_id":"UA--z9PpUBzCAAAA","ts":1344412285,"ih":"1","fr":"+919912345678","userstate":"1","ml":2}
@sureshsaggar
sureshsaggar / SSPigLoader.java
Created August 13, 2012 15:43
Custom loader UDF extending LoadFunc of PIG Latin.
package com.ss.analytics.pig;
public class SSPigLoader extends LoadFunc{
private byte fieldDel = '|';
protected RecordReader in = null;
private ArrayList<Object> tokensArrayList = null;
private TupleFactory tupleFactory = TupleFactory.getInstance();
private List<String> reqKeyNames;
@Override