Skip to content

Instantly share code, notes, and snippets.

#!/usr/bin/python
import time
import datetime
import random
timestr = time.strftime("%Y%m%d-%H%M%S")
f = open('access_log_'+timestr+'.log','w')
ips=["123.221.14.56","16.180.70.237","10.182.189.79","218.193.16.244","198.122.118.164","114.214.178.92","233.192.62.103","244.157.45.12","81.73.150.239","237.43.24.118"]
referers=["-","http://www.casualcyclist.com","http://bestcyclingreviews.com/top_online_shops","http://bleater.com","http://searchengine.com"]
@lordjc
lordjc / 0_reuse_code.js
Last active August 29, 2015 14:16
Here are some things you can do with Gists in GistBox.
// Use Gists to store code you would like to remember later on
console.log(window); // log the "window" object to the console
#!/bin/bash
LINKDIR=/usr/bin
JHOME=/usr/java/jdk1.7.0_67-cloudera/
JREDIR=$JHOME/jre/bin
JDKDIR=$JHOME/bin
sudo alternatives --install $LINKDIR/java java $JREDIR/java 20000 \
--slave $LINKDIR/keytool keytool $JREDIR/keytool \
--slave $LINKDIR/orbd orbd $JREDIR/orbd \
#!/bin/sh
setup_brew () {
if ![-f "/usr/local/bin/brew"]; then
/usr/bin/ruby -e "$(/usr/bin/curl -fsSL https://raw.github.com/mxcl/homebrew/master/Library/Contributions/install_homebrew.rb)"
fi
}
setup_ipython () {
brew install readline
@lordjc
lordjc / getcsv.sh
Created March 21, 2014 19:47
grab a file with sftp and put on hdfs
!/bin/bash
export tnow=$(date +"%Y-%m-%d")
export hnow=$(date +"%Y-%m-%d:%H")
export d_fname=ItemFulfillment-$tnow.csv
export h_fname=ItemFulfillment-$hnow.csv
export dirname=ItemFulfillment
export FTP_SERVER='IP'
@lordjc
lordjc / csv2metadata.py
Created March 21, 2014 19:45
Generate Avro schema and DDLs from CSV headers
#!/usr/bin/python
import csv
import sys
import argparse
from string import Template
import subprocess
debug = False
def output(hdfspath,data):
@lordjc
lordjc / csv2avro.py
Created March 21, 2014 19:43
Convert CSV to Avro
#!/usr/bin/python
import csv
import sys
import argparse
import io
def genSchema(coldict):
ss = """
{"namespace": "example.avro",
@lordjc
lordjc / cleancsv.py
Created March 21, 2014 19:42
Clean csv of in-line newlines
#!/usr/bin/python
import csv
import sys
import argparse
import io
csv.field_size_limit(sys.maxsize)
parser = argparse.ArgumentParser(description='Clean csv of in-line newlines')
parser.add_argument('infile',help='Path to input CSV file');
@lordjc
lordjc / sqoopSqlServerAndCreateHive.xml
Created March 21, 2014 17:41
sqoop a table from sqlserver -> avro on hdfs -> create schema -> create hive table
<workflow-app name="sqoopSqlServerAndCreateHive" xmlns="uri:oozie:workflow:0.4">
<global>
<configuration>
<property>
<name>table</name>
<value></value>
</property>
<property>
<name></name>
<value></value>
#!/usr/bin/python
#
#pip install https://pyodbc.googlecode.com/files/pyodbc-2.1.11.zip
#Install the microsoft odbc driver for linux available here:
#http://www.microsoft.com/en-us/download/details.aspx?id=28160
import pyodbc
cnxn = pyodbc.connect('DRIVER=SQL Server Native Client 11.0;SERVER=<ip>;UID=<username>;PWD=<password>')
cursor = cnxn.cursor()