Skip to content

Instantly share code, notes, and snippets.

Avatar

Daniel Molnar soobrosa

View GitHub Profile
View gist:9483812
import csv
import sys
infile = sys.argv[1]
outfile = sys.argv[2]
with open(infile) as f:
reader = csv.reader(f)
cols = []
for row in reader:
cols.append(row)
View gist:9888861
### Keybase proof
I hereby claim:
* I am soobrosa on github.
* I am soobrosa (https://keybase.io/soobrosa) on keybase.
* I have a public key whose fingerprint is 19DA 1DE2 BEBD F91F 3EEA A264 30F5 64DE 4D6E 279F
To claim this, I am signing this object:
View gist:0c7922f9bb4cabe12204
def daterange(start_date, end_date):
for n in range(int ((end_date - start_date).days)):
yield start_date + timedelta(n)
@soobrosa
soobrosa / extract_db_from_twb.py
Last active Aug 29, 2015
Extract database connections from Tableau workbooks
View extract_db_from_twb.py
# parses directories for .twb files
# and extracts database connections
# rough and dirty
import os
for root, dirs, files in os.walk('.'):
for file in files:
if file.endswith('.twb'):
f = open(root + '/' + file)
View difflibo.py
import difflib
def neighbours(table, entry):
neighbour_list = difflib.get_close_matches(entry, table)
returns = {}
for neighbour in neighbour_list:
returns[neighbour] = difflib.SequenceMatcher(None, entry, neighbour).ratio()
return returns
@soobrosa
soobrosa / Makefile
Created Nov 20, 2015
Is Yelp international?
View Makefile
source.downloaded:
mkdir source
cd source && { curl -O "https://d396qusza40orc.cloudfront.net/dsscapstone/dataset/yelp_dataset_challenge_academic_dataset.zip" ; cd -; }
source.decompressed: source.downloaded
unzip source/yelp_dataset_challenge_academic_dataset.zip
#
# one record pretty printed from each file composed of lines of JSONs
#
@soobrosa
soobrosa / waterlevel_etl.py
Last active Dec 18, 2015
light ETL to reformat webdata from http://www.hydroinfo.hu/Html/archivum/archiv_tabla.html to a TSV with date and value columns
View waterlevel_etl.py
fi = open ('vizallas.txt', 'r')
fo = open ('vizallas.tsv', 'w')
year = ''
for li in fi:
# fixup days not existing in a given month
it = li.strip().replace(' ',' ... ').split(' ')
if len(it) < 2:
@soobrosa
soobrosa / export_miso_history.py
Last active Dec 21, 2015
if you ever happen to export your miso history in python
View export_miso_history.py
# to export your miso history
# register an app at http://gomiso.com/oauth_clients
# gain your consumer_key and consumer_secret
# grab Gomiso Python from https://github.com/metabaron/Gomiso-Python
# and your ready to go
#
# cc 2013 soobrosa
from gomiso import gomiso
import json
@soobrosa
soobrosa / homogenize.py
Last active Dec 23, 2015
homogenize text
View homogenize.py
# input is unicode
import unicodedata
def homogenize (text):
text = unicodedata.normalize('NFKD', text).encode('ascii', 'ignore')
text = text.lower()
return text
@soobrosa
soobrosa / clean.py
Created Sep 25, 2013
clean text (needs homogenize)
View clean.py
def clean(sentence):
stopchars = ['.', ',', '?', '!', '"', '-']
gain = []
sentence = sentence.lower()
for char in stopchars:
sentence = sentence.replace(char,' ')
words = sentence.split(' ')
for word in words:
if word <> '':