Skip to content

Instantly share code, notes, and snippets.

View nickwhite917's full-sized avatar

Nicholas White nickwhite917

View GitHub Profile
@nickwhite917
nickwhite917 / Python_TSV_Num_Cols.py
Created March 13, 2017 14:25
Print number of columns in a tab-delimited file using Python
import glob, os
for file in glob.glob("your/path/here/*"):
print("{}\t{}".format(os.path.split(file)[1],len(str(open(file,'r').readline()).split('\t'))))
@nickwhite917
nickwhite917 / Python_Quick_Select.py
Created March 7, 2017 17:50
Quick select using Python
"""
QuickSelect finds the kth smallest element of an array in mostly linear time.
"""
import random
def Partition(a):
"""
Usage: (left,pivot,right) = Partition(array)
Partitions an array around a randomly chosen pivot such that
left elements <= pivot <= right elements.
@nickwhite917
nickwhite917 / Python_FlatFile_Join_Single_Column.py
Created March 2, 2017 21:30
Print the records that match in two flatfiles, keyed on a single column, using Python.
def join_files(filepath_a,col_a,delim_a,filepath_b,col_b,delim_b):
result_set = []
file_a = open(filepath_a,'r')
file_b = open(filepath_b, 'r')
lines_a = file_a.readlines()
lines_b = file_b.readlines()
for index_a, line_a in enumerate(lines_a):
data_a = str(line_a.split(delim_a)[col_a])
for index_b, line_b in enumerate(lines_b):
@nickwhite917
nickwhite917 / Python_Hash_Join.py
Created February 28, 2017 16:08
Hash join implemented in Python.
from collections import defaultdict
def hashJoin(table1, index1, table2, index2):
h = defaultdict(list)
# hash phase
for s in table1:
h[s[index1]].append(s)
# join phase
return [(s, r) for r in table2 for s in h[r[index2]]]
# Here is my implementation of a nested loop join.
# It takes the two lists, along with two other lists which
# contain the indexes of the columns on which to be joined.
# For example:
# If a[1] is to be joined to b[2] and a[2] to b[3] than the
# arguments would like like so: join(a,[1,2],b,[2,3])
listA = [["SomeString1", "A", "1"],
["SomeString2", "A", "2"],
["SomeString3", "B", "1"],
["SomeString4", "B", "2"]]
@nickwhite917
nickwhite917 / C#_DAC_Execution_Plan_Launcher.cs
Created February 17, 2017 13:17
Execute a DAC Execution Plan from C# using SSH.NET
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Renci.SshNet;
namespace SSH_Net
{
class Program
@nickwhite917
nickwhite917 / Sqoop_DB2_to_Hive.sh
Created February 15, 2017 19:59
Sqoop from DB2 to Hive.
# Example command to run: ./sqoop_warehouse_to_hive_db warehouse_table_name table_partition_column
echo "Running sqoop import on table: $1 with key $2. If this is not correct, exit now with CTRL-C - Nick"
sleep 5
sqoop import --connect jdbc:db2://server.host.name:port/DatabaseName --username user123 --password pass456 \
--table $1 --split-by $2 --fields-terminated-by '\t' \
--hive-overwrite --hive-import --hive-table $1 --hive-database warehouse
@nickwhite917
nickwhite917 / Python_Utility_FTP_Sub_Directories.py
Created February 15, 2017 18:50
Add sub-directories to FTP site with Python
import sys
from ftplib import FTP
def initialize_ftp_connection(ftp_servers, server_name):
try:
conf = ftp_servers[server_name]
ftp_conn = FTP(conf[0])
ftp_conn.login(user=conf[1], passwd=conf[2])
ftp_conn.cwd(conf[3])
return ftp_conn
@nickwhite917
nickwhite917 / Python_Fibonacci_With_Without_Memoization.py
Created February 14, 2017 03:33
Fibonacci with and without Memoization in Python
def fib_w_memo(n, memo = {}):
if n == 0:
return 0
if n == 1:
return 1
if n in memo:
return memo[n]
else:
memo[n] = fib_w_memo(n-1, memo) + fib_w_memo(n-2, memo)
@nickwhite917
nickwhite917 / Python_Searching_Binary_Search.py
Created February 13, 2017 01:33
Binary Search in Python
def binary_search(ele, arr, min = 0, max = None):
if max is None:
max = len(arr) - 1
half = min + (max - min) / 2
value = arr[half]
if value == ele:
return half
if ele < value:
max = half