Skip to content

Instantly share code, notes, and snippets.

@sureshsaggar
sureshsaggar / delete_s3_bucket.py
Created June 3, 2014 04:03
Deleting Amazon S3 bucket using Boto
import boto
from boto.s3.connection import OrdinaryCallingFormat
(aws_access_key_id, aws_secret_access_key) = ('<aws_access_key_id>', '<aws_secret_access_key>')
def deleteBucket(aws_access_key_id, aws_secret_access_key, bname):
s3 = boto.connect_s3(aws_access_key_id=aws_access_key_id, aws_secret_access_key=aws_secret_access_key, calling_format=OrdinaryCallingFormat())
print '# Permanently deleting bucket[%s]...' %bname
bucket = s3.get_bucket(bname)
bucketListResultSet = bucket.list()
return bucket.delete_keys([key.name for key in bucketListResultSet])
@sureshsaggar
sureshsaggar / Apache Pig - Convert bytearray(json) to bag or maps?
Created June 21, 2013 09:56
Apache Pig - Convert bytearray(json) to bag or maps?
Here X contains one column named 'metadata' of type bytearray. But the actual content is a JSON i.e. the d is a JSON (keys as sId & cId) as shown below:
grunt> describe X
X: {metadata: bytearray}
grunt> dump X
({"sId":"003_w","cId":"k"})
({"sId":"001_rf","cId":"r"})
({"sId":"001_rf","cId":"r"})
({"sId":"004_rf","cId":"r"})
@sureshsaggar
sureshsaggar / mysqltail.py
Last active December 18, 2015 15:39
Analytics - Incremental tail for MySQL
'''
USAGE: ubuntu@mysql-ab1:~$ python mysqltail.py
'''
from pymysqlreplication import BinLogStreamReader
from pymysqlreplication.row_event import (DeleteRowsEvent, UpdateRowsEvent, WriteRowsEvent)
import types
import sys
from datetime import date
import redis
import time
@sureshsaggar
sureshsaggar / Hadoop distcp between hortonworks and cloudera
Last active December 18, 2015 04:48
Hadoop distcp between hortonworks and cloudera
hdfs@hadoop-prod-growthui:~$ hadoop distcp -i hdfs://hadoop-prod-master.vpc:8020/data/analytics/smsrecords hdfs://10.0.0.144:8020/data/analytics/smsrecords
13/06/07 07:18:19 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=true, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[hdfs://hadoop-prod-master.vpc:8020/data/analytics/smsrecords], targetPath=hdfs://10.0.0.144:8020/data/analytics/smsrecords}
13/06/07 07:18:22 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
13/06/07 07:18:23 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
13/06/07 07:18:26 ERROR tools.DistCp: Exception encountered
java.io.IOException: Failed on local exception: java.io.IOException: Broken pipe; Host Details : local host is: "hadoop-prod-growthui.vpc/10.0.0.230"; destination host is: "ip-10-0-0-144.ap-southeast-1.comp
@sureshsaggar
sureshsaggar / gist:5494401
Created May 1, 2013 09:02
Forecasting in R: Python API to analyze time series data in Redis
curl -X GET http://0.0.0.0:6600/rpy/linearregression
{
"stat": "pass",
"future": {
"c": -19581061.761599176,
"points": 4,
"m": 0.01434285714285654,
"predictions": {
"2013-05-05": 36307,
"2013-05-04": 35068,
@sureshsaggar
sureshsaggar / Snippet from Redis
Created May 1, 2013 08:40
Forecasting in R: Python API to analyze time series data in Redis
redis 127.0.0.1:6379> hgetall linearregression
1) "1367373828" <<<< timestamp
2) "30860" <<<< visitors count
3) "1367473828"
4) "32860"
5) "1367273828"
6) "28860"
7) "1367073828"
8) "27060"
.....
@sureshsaggar
sureshsaggar / gist:5493610
Created May 1, 2013 03:40
Forecasting in R: Developing Python API to analyze time series data in Redis
'''
Description: dashboard/* primarily powers the APIs requires to support the growth dashboards.
@author: sureshsaggar
'''
#!/usr/bin/env python
from httpserver import *
from werkzeug.routing import *
import time
import redis
@sureshsaggar
sureshsaggar / gist:5270339
Last active December 15, 2015 13:49
ERROR 1066: Unable to open iterator for alias log
hdfs@hadoop-prod-growthui:~$ cat /var/lib/hdfs/pig_1364556796933.log
Pig Stack Trace
---------------
ERROR 1066: Unable to open iterator for alias log
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias log
at org.apache.pig.PigServer.openIterator(PigServer.java:836)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:696)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:320)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
@sureshsaggar
sureshsaggar / Analytics - Join two collections in MongoDB
Last active December 10, 2015 14:49
Analytics - Join two collections in MongoDB
/*
* Pending - Covert _id to uid or vice versa
*/
// Collection 01 - users
users_map = function() {
// Simply emit the msisdn and 0 for the file length.
// The file length will come from the other collection.
emit(this._id, { msisdn: this.msisdn, file_length: 0 });
}
@sureshsaggar
sureshsaggar / Setup GeoIP with NginX & PHP
Last active June 26, 2022 20:35
Setup GeoIP with NginX & PHP
On my Ubuntu machine I located the GeoIP.dat file. If not available then download/intall the same.
root@localhost:~# locate GeoIP.dat
/usr/share/GeoIP/GeoIP.dat
Open Nginx configuration (/etc/nginx/nginx.conf) and specify <geoip_country> <path to GeoIP.dat>
line under the "http" block. Example block:
http {
# SS - meant to find country code from the client IP