Skip to content

Instantly share code, notes, and snippets.

View soobrosa's full-sized avatar

Daniel Molnar soobrosa

View GitHub Profile
@soobrosa
soobrosa / homogenize.py
Last active December 23, 2015 21:39
homogenize text
# input is unicode
import unicodedata
def homogenize (text):
text = unicodedata.normalize('NFKD', text).encode('ascii', 'ignore')
text = text.lower()
return text
@soobrosa
soobrosa / export_miso_history.py
Last active December 21, 2015 19:49
if you ever happen to export your miso history in python
# to export your miso history
# register an app at http://gomiso.com/oauth_clients
# gain your consumer_key and consumer_secret
# grab Gomiso Python from https://github.com/metabaron/Gomiso-Python
# and your ready to go
#
# cc 2013 soobrosa
from gomiso import gomiso
import json
@soobrosa
soobrosa / waterlevel_etl.py
Last active December 18, 2015 07:38
light ETL to reformat webdata from http://www.hydroinfo.hu/Html/archivum/archiv_tabla.html to a TSV with date and value columns
fi = open ('vizallas.txt', 'r')
fo = open ('vizallas.tsv', 'w')
year = ''
for li in fi:
# fixup days not existing in a given month
it = li.strip().replace(' ',' ... ').split(' ')
if len(it) < 2:
@soobrosa
soobrosa / Makefile
Created November 20, 2015 20:14
Is Yelp international?
source.downloaded:
mkdir source
cd source && { curl -O "https://d396qusza40orc.cloudfront.net/dsscapstone/dataset/yelp_dataset_challenge_academic_dataset.zip" ; cd -; }
source.decompressed: source.downloaded
unzip source/yelp_dataset_challenge_academic_dataset.zip
#
# one record pretty printed from each file composed of lines of JSONs
#
@soobrosa
soobrosa / difflibo.py
Created April 11, 2012 13:24
difflib chunk
import difflib
def neighbours(table, entry):
neighbour_list = difflib.get_close_matches(entry, table)
returns = {}
for neighbour in neighbour_list:
returns[neighbour] = difflib.SequenceMatcher(None, entry, neighbour).ratio()
return returns
@soobrosa
soobrosa / extract_db_from_twb.py
Last active August 29, 2015 14:06
Extract database connections from Tableau workbooks
# parses directories for .twb files
# and extracts database connections
# rough and dirty
import os
for root, dirs, files in os.walk('.'):
for file in files:
if file.endswith('.twb'):
f = open(root + '/' + file)
def daterange(start_date, end_date):
for n in range(int ((end_date - start_date).days)):
yield start_date + timedelta(n)
@soobrosa
soobrosa / gist:9888861
Last active August 29, 2015 13:57
keybase.md
### Keybase proof
I hereby claim:
* I am soobrosa on github.
* I am soobrosa (https://keybase.io/soobrosa) on keybase.
* I have a public key whose fingerprint is 19DA 1DE2 BEBD F91F 3EEA A264 30F5 64DE 4D6E 279F
To claim this, I am signing this object:
@soobrosa
soobrosa / gist:9483812
Created March 11, 2014 11:15
csv transpose
import csv
import sys
infile = sys.argv[1]
outfile = sys.argv[2]
with open(infile) as f:
reader = csv.reader(f)
cols = []
for row in reader:
cols.append(row)