Skip to content

Instantly share code, notes, and snippets.

Brendan O'Connor brendano

Block or report user

Report or block brendano

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
@brendano
brendano / autolog.py
Created Oct 10, 2008
python decorators to log all method calls, show call graphs in realtime too
View autolog.py
# Written by Brendan O'Connor, brenocon@gmail.com, www.anyall.org
# * Originally written Aug. 2005
# * Posted to gist.github.com/16173 on Oct. 2008
# Copyright (c) 2003-2006 Open Source Applications Foundation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
@brendano
brendano / merged.csv
Created Oct 11, 2008
political bias algorithm analysis, scraping and comparison to skewz.com - see anyall.org/blog?p=189
View merged.csv
We can make this file beautiful and searchable if this error is corrected: It looks like row 8 should actually have 9 columns, instead of 5. in line 7.
name,score_skewz,score_svd,url,v1,v2,v3,v4,v5
The Politico,-0.133333333333333,-0.069840595513546,politico.com,-0.0579919888228,-0.0156533209161,-0.0118276408031,-0.000672353189093,0.00899951990495
Right Wing Nut House,0.666666666666667,0.016997861495122,rightwingnuthouse.com,-0.0114438419789,0.00923210186058,-0.000332659887795,-0.00357075698976,0.0194133595538
Chicago Tribune,0.0,0.011507686305562,chicagotribune.com,-0.00487815404818,0.0062502057793,0.00472616298604,-0.00370269426842,-0.00354255787188
City Journal,0.566666666666667,0.002719928640919,city-journal.org,-0.000318806368726,0.00147728337907,0.000218460777,-0.000500262448403,-0.00112420748062
Time,-0.1,-0.01921486123282,time.com,-0.0206799675285,-0.00430661260867,-0.00335205354211,-0.00167995286891,-0.0152016073966
National Enquirer,0.533333333333333,-0.008120760725041,nationalenquirer.com,-0.00279469690892,-0.0018201000833,-0.00761346294708,0.00713945342214,-0.00165965873961
AlterNet,-0.633333333333333,-0.029834727529704,alternet.org,-0.0066
@brendano
brendano / xlsx2tsv.py
Created Nov 7, 2008
xlsx2tsv: python command-line script to convert xlsx (Excel "OOXML") into tab-separated values
View xlsx2tsv.py
#!/usr/bin/env python
"""
xlsx2tsv filename.xlsx [sheet number or name]
Parse a .xlsx (Excel OOXML, which is not OpenOffice) into tab-separated values.
If it has multiple sheets, need to give a sheet number or name.
Outputs honest-to-goodness tsv, no quoting or embedded \\n\\r\\t.
One reason I wrote this is because Mac Excel 2008 export to csv or tsv messes
up encodings, converting everything to something that's not utf8 (macroman
@brendano
brendano / setdiff.py
Created Nov 7, 2008
commandline set operations on files
View setdiff.py
#!/usr/bin/env python
""" set operations on files as lists. symlink this as:
* setdiff [-c] <set1> <set2> - set difference
* setand [-c] <set1> <set2> - set intersection
* setor [-c] <set1> <set2> - set union
-c means: give count of the result
Output order is randomish
We don't newline chomp, so a bug if your file doesnt end with a newline
Dash - for stdin (e.g. cut/awk/sed/grep)
Though in zsh, =(bla bla) syntax is superior: can do 2 pipeline inputs
View md5sort.py
#!/usr/bin/env python
""" sorts lines (or tab-sep records) by md5. (e.g. for train/test splits).
optionally prepends with the md5 id too.
brendan o'connor - anyall.org - gist.github.com/brendano """
import hashlib,sys,optparse
p = optparse.OptionParser()
p.add_option('-k', type='int', default=False)
p.add_option('-p', action='store_true')
opts,args=p.parse_args()
View ajaxgoogle.py
"""ajaxgoogle.py - Simple bindings to the AJAX Google Search API
(Just the JSON-over-HTTP bit of it, nothing to do with AJAX per se)
http://code.google.com/apis/ajaxsearch/documentation/reference.html#_intro_fonje
brendan o'connor - gist.github.com/28405 - anyall.org"""
try:
import json
except ImportError:
import simplejson as json
import urllib, urllib2
@brendano
brendano / gist:28439
Created Nov 24, 2008
pipe fiddling: (1) kill buffering (2) output redir kills stdout encoding, so force it
View gist:28439
# Pipe-oriented I/O in Python. This is harder than it should be.
# (1) Kill stdout buffering. makes redirects and tee easier to use.
if "<fdopen>" not in str(sys.stdout): sys.stdout = os.fdopen(1,'w',0)
# (2) Encoding madness. Note codecs.open() isn't available to us since we're using pipes.
import codecs
sys.stdout = codecs.EncodedFile(sys.stdout,'utf-8','utf-8','ignore')
# or this too .. sys.stdout = codecs.getwriter('utf-8')(sys.stdout)
# I'm interested in safely handling potentially garbled input data, so want to protect stdin.
# You'd think this would work:
@brendano
brendano / gist:39760
Created Dec 24, 2008
load the MNIST data set in R
View gist:39760
# Load the MNIST digit recognition dataset into R
# http://yann.lecun.com/exdb/mnist/
# assume you have all 4 files and gunzip'd them
# creates train$n, train$x, train$y and test$n, test$x, test$y
# e.g. train$x is a 60000 x 784 matrix, each row is one digit (28x28)
# call: show_digit(train$x[5,]) to see a digit.
# brendan o'connor - gist.github.com/39760 - anyall.org
load_mnist <- function() {
load_image_file <- function(filename) {
View gist:59943
CSV from PostgreSQL, at least as far as I can tell. i'm sure messes up embedded quotes and maybe embedded commas.
psql.csv() { psql -qAF , "$@" | egrep -v '^\([0-9]+ rows\)$' }
View tabsort
#!/bin/bash
export TAB=$(echo -e "\t")
exec sort "-t$TAB" "$@"
You can’t perform that action at this time.