View delete_s3_bucket.py
import boto
from boto.s3.connection import OrdinaryCallingFormat
(aws_access_key_id, aws_secret_access_key) = ('<aws_access_key_id>', '<aws_secret_access_key>')
def deleteBucket(aws_access_key_id, aws_secret_access_key, bname):
s3 = boto.connect_s3(aws_access_key_id=aws_access_key_id, aws_secret_access_key=aws_secret_access_key, calling_format=OrdinaryCallingFormat())
print '# Permanently deleting bucket[%s]...' %bname
bucket = s3.get_bucket(bname)
bucketListResultSet = bucket.list()
return bucket.delete_keys([key.name for key in bucketListResultSet])
View Apache Pig - Convert bytearray(json) to bag or maps?
Here X contains one column named 'metadata' of type bytearray. But the actual content is a JSON i.e. the d is a JSON (keys as sId & cId) as shown below:
grunt> describe X
X: {metadata: bytearray}
grunt> dump X
({"sId":"003_w","cId":"k"})
({"sId":"001_rf","cId":"r"})
({"sId":"001_rf","cId":"r"})
({"sId":"004_rf","cId":"r"})
View mysqltail.py
'''
USAGE: ubuntu@mysql-ab1:~$ python mysqltail.py
'''
from pymysqlreplication import BinLogStreamReader
from pymysqlreplication.row_event import (DeleteRowsEvent, UpdateRowsEvent, WriteRowsEvent)
import types
import sys
from datetime import date
import redis
import time
View Hadoop distcp between hortonworks and cloudera
hdfs@hadoop-prod-growthui:~$ hadoop distcp -i hdfs://hadoop-prod-master.vpc:8020/data/analytics/smsrecords hdfs://10.0.0.144:8020/data/analytics/smsrecords
13/06/07 07:18:19 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=true, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[hdfs://hadoop-prod-master.vpc:8020/data/analytics/smsrecords], targetPath=hdfs://10.0.0.144:8020/data/analytics/smsrecords}
13/06/07 07:18:22 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
13/06/07 07:18:23 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
13/06/07 07:18:26 ERROR tools.DistCp: Exception encountered
java.io.IOException: Failed on local exception: java.io.IOException: Broken pipe; Host Details : local host is: "hadoop-prod-growthui.vpc/10.0.0.230"; destination host is: "ip-10-0-0-144.ap-southeast-1.comp
View gist:5494401
curl -X GET http://0.0.0.0:6600/rpy/linearregression
{
"stat": "pass",
"future": {
"c": -19581061.761599176,
"points": 4,
"m": 0.01434285714285654,
"predictions": {
"2013-05-05": 36307,
"2013-05-04": 35068,
View Snippet from Redis
redis 127.0.0.1:6379> hgetall linearregression
1) "1367373828" <<<< timestamp
2) "30860" <<<< visitors count
3) "1367473828"
4) "32860"
5) "1367273828"
6) "28860"
7) "1367073828"
8) "27060"
.....
View gist:5493610
'''
Description: dashboard/* primarily powers the APIs requires to support the growth dashboards.
@author: sureshsaggar
'''
#!/usr/bin/env python
from httpserver import *
from werkzeug.routing import *
import time
import redis
View gist:5270339
hdfs@hadoop-prod-growthui:~$ cat /var/lib/hdfs/pig_1364556796933.log
Pig Stack Trace
---------------
ERROR 1066: Unable to open iterator for alias log
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias log
at org.apache.pig.PigServer.openIterator(PigServer.java:836)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:696)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:320)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
View Analytics - Join two collections in MongoDB
/*
* Pending - Covert _id to uid or vice versa
*/
// Collection 01 - users
users_map = function() {
// Simply emit the msisdn and 0 for the file length.
// The file length will come from the other collection.
emit(this._id, { msisdn: this.msisdn, file_length: 0 });
}
View Setup GeoIP with NginX & PHP
On my Ubuntu machine I located the GeoIP.dat file. If not available then download/intall the same.
root@localhost:~# locate GeoIP.dat
/usr/share/GeoIP/GeoIP.dat
Open Nginx configuration (/etc/nginx/nginx.conf) and specify <geoip_country> <path to GeoIP.dat>
line under the "http" block. Example block:
http {
# SS - meant to find country code from the client IP