Skip to content

Instantly share code, notes, and snippets.

Brendan O'Connor brendano

Block or report user

Report or block brendano

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
brendano /
Created Oct 10, 2008
python decorators to log all method calls, show call graphs in realtime too
# Written by Brendan O'Connor,,
# * Originally written Aug. 2005
# * Posted to on Oct. 2008
# Copyright (c) 2003-2006 Open Source Applications Foundation
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
brendano / merged.csv
Created Oct 11, 2008
political bias algorithm analysis, scraping and comparison to - see
View merged.csv
We can make this file beautiful and searchable if this error is corrected: It looks like row 8 should actually have 9 columns, instead of 5. in line 7.
The Politico,-0.133333333333333,-0.069840595513546,,-0.0579919888228,-0.0156533209161,-0.0118276408031,-0.000672353189093,0.00899951990495
Right Wing Nut House,0.666666666666667,0.016997861495122,,-0.0114438419789,0.00923210186058,-0.000332659887795,-0.00357075698976,0.0194133595538
Chicago Tribune,0.0,0.011507686305562,,-0.00487815404818,0.0062502057793,0.00472616298604,-0.00370269426842,-0.00354255787188
City Journal,0.566666666666667,0.002719928640919,,-0.000318806368726,0.00147728337907,0.000218460777,-0.000500262448403,-0.00112420748062
National Enquirer,0.533333333333333,-0.008120760725041,,-0.00279469690892,-0.0018201000833,-0.00761346294708,0.00713945342214,-0.00165965873961
brendano /
Created Nov 7, 2008
xlsx2tsv: python command-line script to convert xlsx (Excel "OOXML") into tab-separated values
#!/usr/bin/env python
xlsx2tsv filename.xlsx [sheet number or name]
Parse a .xlsx (Excel OOXML, which is not OpenOffice) into tab-separated values.
If it has multiple sheets, need to give a sheet number or name.
Outputs honest-to-goodness tsv, no quoting or embedded \\n\\r\\t.
One reason I wrote this is because Mac Excel 2008 export to csv or tsv messes
up encodings, converting everything to something that's not utf8 (macroman
brendano /
Created Nov 7, 2008
commandline set operations on files
#!/usr/bin/env python
""" set operations on files as lists. symlink this as:
* setdiff [-c] <set1> <set2> - set difference
* setand [-c] <set1> <set2> - set intersection
* setor [-c] <set1> <set2> - set union
-c means: give count of the result
Output order is randomish
We don't newline chomp, so a bug if your file doesnt end with a newline
Dash - for stdin (e.g. cut/awk/sed/grep)
Though in zsh, =(bla bla) syntax is superior: can do 2 pipeline inputs
#!/usr/bin/env python
""" sorts lines (or tab-sep records) by md5. (e.g. for train/test splits).
optionally prepends with the md5 id too.
brendan o'connor - - """
import hashlib,sys,optparse
p = optparse.OptionParser()
p.add_option('-k', type='int', default=False)
p.add_option('-p', action='store_true')
""" - Simple bindings to the AJAX Google Search API
(Just the JSON-over-HTTP bit of it, nothing to do with AJAX per se)
brendan o'connor - -"""
import json
except ImportError:
import simplejson as json
import urllib, urllib2
brendano / gist:28439
Created Nov 24, 2008
pipe fiddling: (1) kill buffering (2) output redir kills stdout encoding, so force it
View gist:28439
# Pipe-oriented I/O in Python. This is harder than it should be.
# (1) Kill stdout buffering. makes redirects and tee easier to use.
if "<fdopen>" not in str(sys.stdout): sys.stdout = os.fdopen(1,'w',0)
# (2) Encoding madness. Note isn't available to us since we're using pipes.
import codecs
sys.stdout = codecs.EncodedFile(sys.stdout,'utf-8','utf-8','ignore')
# or this too .. sys.stdout = codecs.getwriter('utf-8')(sys.stdout)
# I'm interested in safely handling potentially garbled input data, so want to protect stdin.
# You'd think this would work:
brendano / gist:39760
Created Dec 24, 2008
load the MNIST data set in R
View gist:39760
# Load the MNIST digit recognition dataset into R
# assume you have all 4 files and gunzip'd them
# creates train$n, train$x, train$y and test$n, test$x, test$y
# e.g. train$x is a 60000 x 784 matrix, each row is one digit (28x28)
# call: show_digit(train$x[5,]) to see a digit.
# brendan o'connor - -
load_mnist <- function() {
load_image_file <- function(filename) {
View gist:59943
CSV from PostgreSQL, at least as far as I can tell. i'm sure messes up embedded quotes and maybe embedded commas.
psql.csv() { psql -qAF , "$@" | egrep -v '^\([0-9]+ rows\)$' }
View tabsort
export TAB=$(echo -e "\t")
exec sort "-t$TAB" "$@"
You can’t perform that action at this time.