Skip to content

Instantly share code, notes, and snippets.

@sureshsaggar
sureshsaggar / delete_s3_bucket.py
Created Jun 3, 2014
Deleting Amazon S3 bucket using Boto
View delete_s3_bucket.py
import boto
from boto.s3.connection import OrdinaryCallingFormat
(aws_access_key_id, aws_secret_access_key) = ('<aws_access_key_id>', '<aws_secret_access_key>')
def deleteBucket(aws_access_key_id, aws_secret_access_key, bname):
s3 = boto.connect_s3(aws_access_key_id=aws_access_key_id, aws_secret_access_key=aws_secret_access_key, calling_format=OrdinaryCallingFormat())
print '# Permanently deleting bucket[%s]...' %bname
bucket = s3.get_bucket(bname)
bucketListResultSet = bucket.list()
return bucket.delete_keys([key.name for key in bucketListResultSet])
@sureshsaggar
sureshsaggar / SSPigLoader.java
Created Aug 13, 2012
Custom loader UDF extending LoadFunc of PIG Latin.
View SSPigLoader.java
package com.ss.analytics.pig;
public class SSPigLoader extends LoadFunc{
private byte fieldDel = '|';
protected RecordReader in = null;
private ArrayList<Object> tokensArrayList = null;
private TupleFactory tupleFactory = TupleFactory.getInstance();
private List<String> reqKeyNames;
@Override
@sureshsaggar
sureshsaggar / analytics.log
Created Aug 13, 2012
An example input log file for PIG queries.
View analytics.log
2012-08-08 07:50UTC|m|{"to":"+919912345678","distinct_id":"UA--z9PpUBzCAAAA","ts":1344412211,"ih":"1","fr":"+919912345678","ml":6}
2012-08-08 07:50UTC|i|{"to":"+919912345678","distinct_id":"UA--z9PpUBzCAAAA","ts":1344412213}
2012-08-08 07:50UTC|b|{"to":"+919912345678","ts":1344412225,"ih":"1","fr":"+919912345678","tags":"PIG, Hadoop"}
2012-08-08 07:50UTC|b|{"to":"+919912345678","ts":1344412225,"ih":"1","fr":"+919912345678","tags":"Apache, Custom Loader"}
2012-08-08 07:51UTC|i|{"to":"+919912345678","distinct_id":"UA--z9PpUBzCAAAA","ts":1344412271}
2012-08-08 07:51UTC|m|{"to":"+919912345678","distinct_id":"UA--z9PpUBzCAAAA","ts":1344412272,"ih":"1","fr":"+919912345678","userstate":"1","ml":2}
2012-08-08 07:51UTC|m|{"to":"+919912345678","distinct_id":"UA--z9PpUBzCAAAA","ts":1344412283,"ih":"1","fr":"+919912345678", "ml":3}
2012-08-08 07:51UTC|m|{"to":"+919912345678","distinct_id":"UA--z9PpUBzCAAAA","ts":1344412285,"ih":"1","fr":"+919912345678","userstate":"1","ml":2}
View gist:3342190
Sureshs-MacBook-Pro:hadoop sureshsaggar$ pig -x local
grunt> REGISTER /Users/sureshsaggar/Documents/GITHubProjects/hadoop/SSPigLoader.jar;
grunt> REGISTER /Users/sureshsaggar/Documents/GITHubProjects/hadoop/jar/json-simple-1.1.jar;
grunt> data = LOAD '/Users/sureshsaggar/Documents/GITHubProjects/hadoop/logs/analytics.log' USING com.ss.analytics.pig.SSPigLoader('date type to') AS (date: chararray, type :chararray, to: chararray);
grunt> dump data
....
....
....
2012-08-13 21:35:32,203 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
****file:/tmp/temp-265145508/tmp1111831199
View Calabash-Android-Device
Step 01: Find list of devices attached using adb:
/Users/sureshsaggar/android-sdks/platform-tools/adb devices
List of devices attached
222222222cb65fdf device
Step 02: Set ANDROID_HOME
export ANDROID_HOME=/Users/sureshsaggar/Downloads/android-sdk-macosx
echo $ANDROID_HOME
/Users/sureshsaggar/Downloads/android-sdk-macosx
View Calabash-Android-Emulator
Step01: Verify that some suitable AVD exists:
Sureshs-MacBook-Pro:android-sdk-macosx sureshsaggar$ tools/android list avd
Available Android Virtual Devices:
Name: AndroidApp
Path: /Users/sureshsaggar/.android/avd/AndroidApp.avd
Target: Google APIs (Google Inc.)
Based on Android 2.3.3 (API level 10)
ABI: armeabi
Skin: HVGA
View Apache FlumeNG and HDFS Sink
# Directory:
# root@localhost:~/flume/apache-flume-1.4.0-SNAPSHOT
# Usage:
# bin/flume-ng agent --conf ./conf/ -f conf/flume-agents-tests.conf -Dflume.root.logger=DEBUG,console -n agent_test_exec_TO_hdfs
# ------------------------------------------------------------------------------------
# This workflow applies to some WebServer running flume agent and dumping data
# back in HDFS. Here /tmp/ping.txt could be any log file.
# ------------------------------------------------------------------------------------
View Output from HDFS - Flume NG
root@hadoop2-prod:~# /usr/local/hadoop/bin/hadoop fs -cat /tmp/agent-webserver/FlumeData.1351688879208
64 bytes from www.google.com (72.15.18.10): icmp_req=275 ttl=48 time=111 ms
64 bytes from www.google.com (72.15.18.10): icmp_req=276 ttl=48 time=114 ms
64 bytes from www.google.com (72.15.18.10): icmp_req=277 ttl=48 time=114 ms
64 bytes from www.google.com (72.15.18.10): icmp_req=278 ttl=48 time=112 ms
64 bytes from www.google.com (72.15.18.10): icmp_req=279 ttl=48 time=114 ms
64 bytes from www.google.com (72.15.18.10): icmp_req=280 ttl=48 time=114 ms
64 bytes from www.google.com (72.15.18.10): icmp_req=281 ttl=48 time=114 ms
64 bytes from www.google.com (72.15.18.10): icmp_req=282 ttl=48 time=111 ms
64 bytes from www.google.com (72.15.18.10): icmp_req=283 ttl=48 time=114 ms
@sureshsaggar
sureshsaggar / Agent 01 - Using Exec Source & AVRO Sink
Created Nov 1, 2012
Query: Saving data in _raw_ format in HDFS using FlumeNG?
View Agent 01 - Using Exec Source & AVRO Sink
# Usage:
# bin/flume-ng agent --conf ./conf/ -f conf/flume-webserver-agents.conf -Dflume.root.logger=DEBUG,console -n agent_hikemon
# ------------------------------------------------------------------------------------
# Agent 01 - Using Exec Source & AVRO Sink
# ------------------------------------------------------------------------------------
agent_hikemon.sinks = avroSink-consolidator
agent_hikemon.sources = tailSource
agent_hikemon.channels = MemoryChannel-WebServer
agent_hikemon.channels.MemoryChannel-WebServer.type = memory
@sureshsaggar
sureshsaggar / gist:3995476
Created Nov 1, 2012
HDFS Output with Noise/ Formatting Issue
View gist:3995476
$ hadoop fs -cat /tmp/agent-mqtt/FlumeData.1351679289119
?64 bytes from www.google.com (114.121.18.9): icmp_req=4468 ttl=48 time=118 ms?64 bytes from www.google.com (114.121.18.9): icmp_req=4469 ttl=48 time=116 ms?64 bytes from www.google.com (114.121.18.9): icmp_req=4470 ttl=48 time=118 ms?64 bytes from www.google.com (114.121.18.9): icmp_req=4471 ttl=48 time=116 ms?64 bytes from www.google.com (114.121.18.9): icmp_req=4472 ttl=48 time=116 ms?64 bytes from www.google.com (114.121.18.9): icmp_req=4473 ttl=48 time=117 ms?64 bytes from www.google.com (114.121.18.9): icmp_req=4474 ttl=48 time=118 ms?64 bytes from www.google.com (114.121.18.9): icmp_req=4475 ttl=48 time=116 ms?64 bytes from www.google.com (114.121.18.9): icmp_req=4476 ttl=48 time=116 ms?64 bytes from www.google.com (114.121.18.9): icmp_req=4477 ttl=48 time=118 ms?G?\?C?a?????Ӷ