Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@sureshsaggar
sureshsaggar / SSPigLoader.java
Created August 13, 2012 15:43
Custom loader UDF extending LoadFunc of PIG Latin.
package com.ss.analytics.pig;
public class SSPigLoader extends LoadFunc{
private byte fieldDel = '|';
protected RecordReader in = null;
private ArrayList<Object> tokensArrayList = null;
private TupleFactory tupleFactory = TupleFactory.getInstance();
private List<String> reqKeyNames;
@Override
@sureshsaggar
sureshsaggar / analytics.log
Created August 13, 2012 15:44
An example input log file for PIG queries.
2012-08-08 07:50UTC|m|{"to":"+919912345678","distinct_id":"UA--z9PpUBzCAAAA","ts":1344412211,"ih":"1","fr":"+919912345678","ml":6}
2012-08-08 07:50UTC|i|{"to":"+919912345678","distinct_id":"UA--z9PpUBzCAAAA","ts":1344412213}
2012-08-08 07:50UTC|b|{"to":"+919912345678","ts":1344412225,"ih":"1","fr":"+919912345678","tags":"PIG, Hadoop"}
2012-08-08 07:50UTC|b|{"to":"+919912345678","ts":1344412225,"ih":"1","fr":"+919912345678","tags":"Apache, Custom Loader"}
2012-08-08 07:51UTC|i|{"to":"+919912345678","distinct_id":"UA--z9PpUBzCAAAA","ts":1344412271}
2012-08-08 07:51UTC|m|{"to":"+919912345678","distinct_id":"UA--z9PpUBzCAAAA","ts":1344412272,"ih":"1","fr":"+919912345678","userstate":"1","ml":2}
2012-08-08 07:51UTC|m|{"to":"+919912345678","distinct_id":"UA--z9PpUBzCAAAA","ts":1344412283,"ih":"1","fr":"+919912345678", "ml":3}
2012-08-08 07:51UTC|m|{"to":"+919912345678","distinct_id":"UA--z9PpUBzCAAAA","ts":1344412285,"ih":"1","fr":"+919912345678","userstate":"1","ml":2}
Sureshs-MacBook-Pro:hadoop sureshsaggar$ pig -x local
grunt> REGISTER /Users/sureshsaggar/Documents/GITHubProjects/hadoop/SSPigLoader.jar;
grunt> REGISTER /Users/sureshsaggar/Documents/GITHubProjects/hadoop/jar/json-simple-1.1.jar;
grunt> data = LOAD '/Users/sureshsaggar/Documents/GITHubProjects/hadoop/logs/analytics.log' USING com.ss.analytics.pig.SSPigLoader('date type to') AS (date: chararray, type :chararray, to: chararray);
grunt> dump data
....
....
....
2012-08-13 21:35:32,203 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
****file:/tmp/temp-265145508/tmp1111831199
@sureshsaggar
sureshsaggar / Calabash-Android-Device
Created October 21, 2012 07:27
Calabash-Android
Step 01: Find list of devices attached using adb:
/Users/sureshsaggar/android-sdks/platform-tools/adb devices
List of devices attached
222222222cb65fdf device
Step 02: Set ANDROID_HOME
export ANDROID_HOME=/Users/sureshsaggar/Downloads/android-sdk-macosx
echo $ANDROID_HOME
/Users/sureshsaggar/Downloads/android-sdk-macosx
Step01: Verify that some suitable AVD exists:
Sureshs-MacBook-Pro:android-sdk-macosx sureshsaggar$ tools/android list avd
Available Android Virtual Devices:
Name: AndroidApp
Path: /Users/sureshsaggar/.android/avd/AndroidApp.avd
Target: Google APIs (Google Inc.)
Based on Android 2.3.3 (API level 10)
ABI: armeabi
Skin: HVGA
@sureshsaggar
sureshsaggar / Apache FlumeNG and HDFS Sink
Created October 31, 2012 16:16
Apache FlumeNG and HDFS Sink
# Directory:
# root@localhost:~/flume/apache-flume-1.4.0-SNAPSHOT
# Usage:
# bin/flume-ng agent --conf ./conf/ -f conf/flume-agents-tests.conf -Dflume.root.logger=DEBUG,console -n agent_test_exec_TO_hdfs
# ------------------------------------------------------------------------------------
# This workflow applies to some WebServer running flume agent and dumping data
# back in HDFS. Here /tmp/ping.txt could be any log file.
# ------------------------------------------------------------------------------------
@sureshsaggar
sureshsaggar / Output from HDFS - Flume NG
Created October 31, 2012 16:27
Output from HDFS - Flume NG
root@hadoop2-prod:~# /usr/local/hadoop/bin/hadoop fs -cat /tmp/agent-webserver/FlumeData.1351688879208
64 bytes from www.google.com (72.15.18.10): icmp_req=275 ttl=48 time=111 ms
64 bytes from www.google.com (72.15.18.10): icmp_req=276 ttl=48 time=114 ms
64 bytes from www.google.com (72.15.18.10): icmp_req=277 ttl=48 time=114 ms
64 bytes from www.google.com (72.15.18.10): icmp_req=278 ttl=48 time=112 ms
64 bytes from www.google.com (72.15.18.10): icmp_req=279 ttl=48 time=114 ms
64 bytes from www.google.com (72.15.18.10): icmp_req=280 ttl=48 time=114 ms
64 bytes from www.google.com (72.15.18.10): icmp_req=281 ttl=48 time=114 ms
64 bytes from www.google.com (72.15.18.10): icmp_req=282 ttl=48 time=111 ms
64 bytes from www.google.com (72.15.18.10): icmp_req=283 ttl=48 time=114 ms
@sureshsaggar
sureshsaggar / Agent 01 - Using Exec Source & AVRO Sink
Created November 1, 2012 13:32
Query: Saving data in _raw_ format in HDFS using FlumeNG?
# Usage:
# bin/flume-ng agent --conf ./conf/ -f conf/flume-webserver-agents.conf -Dflume.root.logger=DEBUG,console -n agent_hikemon
# ------------------------------------------------------------------------------------
# Agent 01 - Using Exec Source & AVRO Sink
# ------------------------------------------------------------------------------------
agent_hikemon.sinks = avroSink-consolidator
agent_hikemon.sources = tailSource
agent_hikemon.channels = MemoryChannel-WebServer
agent_hikemon.channels.MemoryChannel-WebServer.type = memory
@sureshsaggar
sureshsaggar / gist:3995476
Created November 1, 2012 18:12
HDFS Output with Noise/ Formatting Issue
$ hadoop fs -cat /tmp/agent-mqtt/FlumeData.1351679289119
?64 bytes from www.google.com (114.121.18.9): icmp_req=4468 ttl=48 time=118 ms?64 bytes from www.google.com (114.121.18.9): icmp_req=4469 ttl=48 time=116 ms?64 bytes from www.google.com (114.121.18.9): icmp_req=4470 ttl=48 time=118 ms?64 bytes from www.google.com (114.121.18.9): icmp_req=4471 ttl=48 time=116 ms?64 bytes from www.google.com (114.121.18.9): icmp_req=4472 ttl=48 time=116 ms?64 bytes from www.google.com (114.121.18.9): icmp_req=4473 ttl=48 time=117 ms?64 bytes from www.google.com (114.121.18.9): icmp_req=4474 ttl=48 time=118 ms?64 bytes from www.google.com (114.121.18.9): icmp_req=4475 ttl=48 time=116 ms?64 bytes from www.google.com (114.121.18.9): icmp_req=4476 ttl=48 time=116 ms?64 bytes from www.google.com (114.121.18.9): icmp_req=4477 ttl=48 time=118 ms?G?\?C?a?????Ӷ
@sureshsaggar
sureshsaggar / gist:4129687
Created November 22, 2012 06:33
Apache PIG - java.lang.UnsupportedOperationException: getJobTrackerAddrs is not supported
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias XYZ_active_users
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1552)
at org.apache.pig.PigServer.registerQuery(PigServer.java:540)
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:970)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
at org.apache.pig.PigServer.registerScript(PigServer.java:614)
at org.apache.pig.PigServer.registerScript(PigServer.java:716)
at org.apache.pig.PigServer.registerScript(PigServer.java:689)
at com.bsb.hike.analytics.pig.BaseHikePigTask.run(BaseHikePigTask.java:148)