Skip to content

Instantly share code, notes, and snippets.

View danielbarkhorn's full-sized avatar
💭
Thinking...

Dan Barkhorn danielbarkhorn

💭
Thinking...
  • New York, New York / Santa Clara, California
View GitHub Profile
@danielbarkhorn
danielbarkhorn / data-cleaning.py
Created July 25, 2018 02:51
Quick solution to a potentially difficult data cleaning question.
# Tasked at work to clean user typed company name data (over 5k 'unique' companies), I needed to programmaticaly combine
# entries like 'E Trade', 'E*Trade', and 'E*Trade Financial'. Initial thought was to use Levenshtein ratios and other computer
# sciency heuristics.
# On second thought, the cleaned data was needed quickly, so I found an API that would autocomplete company names and leveraged it.
import requests
import csv
import Levenshtein
from tqdm import tqdm
import pickle
import numpy as np
import consts
RHO_MATRICES = {tuple(np.linalg.matrix_power(np.array([[0,1],[2,3]]), t).dot(np.array([1,0])) % 5): t for t in range(0,24)}
RHO_MATRICES[(0,0)] = -1
def arrTransform1(arr):
output = [[[0 for _ in range(64)] for _ in range(5)] for _ in range(5)]
for i in range(5):
for j in range(5):
@danielbarkhorn
danielbarkhorn / trie.py
Last active July 25, 2018 01:57
Simple Trie Python Implementation. Comment structure based off SKLearn.
class Trie:
"""
Trie data structure implemented using dictionary
Parameters
----------
dictioanry : array
An array of all the strings to be included in the trie
Attributes