Skip to content

Instantly share code, notes, and snippets.

View brendano's full-sized avatar

Brendan O'Connor brendano

View GitHub Profile
#!/usr/bin/env ruby
# Parse a data table from pollster.com
# Get data via copy-and-paste from http://www.pollster.com/polls/us/08-us-pres-ge-mvo.php
# yields a messy tab-separated thingamajigger (i'm using firefox 3 on mac)
# This script normalizes, in an R-friendly way
require 'date'
def numclean(x)
x =~ /^-$/ ? "NA" : x.to_i
@brendano
brendano / blogger.php
Created October 8, 2008 21:03
blogger -> wordpress redirect code
<?
/**
* *** THIS NEEDS TO BE EDITED FOR A NEW INSTALLATION ***
*
* this is supposed to be called as e.g.
* http://anyall.org/blog/blogger/http://socialscienceplusplus.blogspot.com/2008/10/mydebatesorg-and-poten
tially-coolest.html
* and then redirect to e.g.
* http://anyall.org/blog/2008/10/mydebatesorg-online-polling-and-potentially-the-coolest-question-corpus-
ever/
@brendano
brendano / get.rb
Created October 10, 2008 08:03
network connectivity troubleshooting...
load '~/.irbrc' # dotfiles.org/~brendano/.irbrc
require 'hpricot'
sites=[]
for url in [
"http://www.alexa.com/site/ds/top_sites?ts_mode=lang〈=en"]
h = Hpricot open(url).read
sites += (h/'h3'/'a').map{|x| x['href']}
end
@brendano
brendano / merged.csv
Created October 11, 2008 09:37
political bias algorithm analysis, scraping and comparison to skewz.com - see anyall.org/blog?p=189
We can make this file beautiful and searchable if this error is corrected: It looks like row 8 should actually have 9 columns, instead of 5. in line 7.
name,score_skewz,score_svd,url,v1,v2,v3,v4,v5
The Politico,-0.133333333333333,-0.069840595513546,politico.com,-0.0579919888228,-0.0156533209161,-0.0118276408031,-0.000672353189093,0.00899951990495
Right Wing Nut House,0.666666666666667,0.016997861495122,rightwingnuthouse.com,-0.0114438419789,0.00923210186058,-0.000332659887795,-0.00357075698976,0.0194133595538
Chicago Tribune,0.0,0.011507686305562,chicagotribune.com,-0.00487815404818,0.0062502057793,0.00472616298604,-0.00370269426842,-0.00354255787188
City Journal,0.566666666666667,0.002719928640919,city-journal.org,-0.000318806368726,0.00147728337907,0.000218460777,-0.000500262448403,-0.00112420748062
Time,-0.1,-0.01921486123282,time.com,-0.0206799675285,-0.00430661260867,-0.00335205354211,-0.00167995286891,-0.0152016073966
National Enquirer,0.533333333333333,-0.008120760725041,nationalenquirer.com,-0.00279469690892,-0.0018201000833,-0.00761346294708,0.00713945342214,-0.00165965873961
AlterNet,-0.633333333333333,-0.029834727529704,alternet.org,-0.0066
@brendano
brendano / gist:28439
Created November 24, 2008 10:33
pipe fiddling: (1) kill buffering (2) output redir kills stdout encoding, so force it
# Pipe-oriented I/O in Python. This is harder than it should be.
# (1) Kill stdout buffering. makes redirects and tee easier to use.
if "<fdopen>" not in str(sys.stdout): sys.stdout = os.fdopen(1,'w',0)
# (2) Encoding madness. Note codecs.open() isn't available to us since we're using pipes.
import codecs
sys.stdout = codecs.EncodedFile(sys.stdout,'utf-8','utf-8','ignore')
# or this too .. sys.stdout = codecs.getwriter('utf-8')(sys.stdout)
# I'm interested in safely handling potentially garbled input data, so want to protect stdin.
# You'd think this would work:
"""ajaxgoogle.py - Simple bindings to the AJAX Google Search API
(Just the JSON-over-HTTP bit of it, nothing to do with AJAX per se)
http://code.google.com/apis/ajaxsearch/documentation/reference.html#_intro_fonje
brendan o'connor - gist.github.com/28405 - anyall.org"""
try:
import json
except ImportError:
import simplejson as json
import urllib, urllib2
@brendano
brendano / tabsort
Created February 7, 2009 19:59
tabsort
#!/bin/bash
export TAB=$(echo -e "\t")
exec sort "-t$TAB" "$@"
CSV from PostgreSQL, at least as far as I can tell. i'm sure messes up embedded quotes and maybe embedded commas.
psql.csv() { psql -qAF , "$@" | egrep -v '^\([0-9]+ rows\)$' }
@brendano
brendano / "mailx" doesnt work
Created February 10, 2009 09:31
mail cgi for curl
#!/usr/bin/env ruby
puts "Content-Type: text/plain"
puts
subj = ENV['PATH_INFO'] || ""
subj.gsub!("'", '"')
msg = STDIN.read || ""
# system "env"
@brendano
brendano / map.rb
Created February 12, 2009 22:44
commandline map
#!/usr/bin/env ruby
# like map(), except on shell pipelines
# one arg: the mapper
# transform input lines, via the mapper, into output lines
# mapper is eval'd within the input line string
#
# extract 2nd column
# cat file | map 'split[1]'