Skip to content

Instantly share code, notes, and snippets.

View ajschumacher's full-sized avatar

Aaron Schumacher ajschumacher

View GitHub Profile
@ajschumacher
ajschumacher / make_completer.py
Created June 7, 2012 01:01
This code generates a file that gets a score of 0.61758 at http://www.kaggle.com/c/FacebookRecruiting/
import csv
r = csv.reader(open('train.csv','r'))
r.next()
edges = set()
#commutative_graph = dict()
for edge in r:
edges.add((edge[0], edge[1]))
# commutative_graph.setdefault(edge[0], set()).add(edge[1])
#!/usr/bin/env python
import sys
import csv
def MeanAveragePrecision(valid_filename, attempt_filename, at=10):
at = int(at)
valid = dict()
for line in csv.DictReader(open(valid_filename,'r')):
valid.setdefault(line['source_node'],set()).update(line['destination_nodes'].split(" "))
@ajschumacher
ajschumacher / README.md
Last active December 14, 2015 18:09 — forked from mbostock/.block
NYC Subway Usage

This shows New York City subway usage based on turnstile data. Hovering over a day shows the number of entries through subway turnstiles in that day.

Notice the clear effects of hurricanes Irene and Sandy.

More details and source code available. Visualization based (heavily) on Mike Bostock's excellent example.

@ajschumacher
ajschumacher / README.md
Created April 25, 2013 15:49
NYC schools
@ajschumacher
ajschumacher / README.md
Last active December 16, 2015 23:08
intense brainstorming

oh nothing, nothing...

@ajschumacher
ajschumacher / movie_reviews.py
Created July 16, 2013 02:54
another solution to this munging problem
movie_reviews = dict()
for line in open('movies.txt.small'):
pieces = line.split(':')
if len(pieces) > 1:
key = pieces[0]
value = ':'.join(pieces[1:]).strip()
if key == 'product/productId':
id = value
if key == 'review/text':
movie_reviews.setdefault(id, []).append(value)
@ajschumacher
ajschumacher / logo.R
Created September 5, 2013 01:32
hacky stat prog DC logo in R
set.seed(54)
n <- rnorm(3000,0.5,0.1)
par(mar=c(1,1,3,1))
hist(c(runif(10000),runif(1900,0.35,0.65),n[abs(n-0.5)<0.15]),
xlim=c(0,1),ylim=c(0,600),breaks=51,ylab="",#main="",
main="Statistical Programming DC",col="#C9242D",
axes=F, cex.main=2)
@ajschumacher
ajschumacher / wrapup3.md
Last active December 29, 2015 12:19
DC Hack and Tell Round 3: Hack... to the Future!
@ajschumacher
ajschumacher / wrapup4.md
Created December 20, 2013 02:18
DC Hack and Tell Round 4: The Christmas Invasian
@ajschumacher
ajschumacher / mapping.R
Last active August 29, 2015 13:58
figuring out state key names, or other problem
test <- data.frame(state=c("Florida", "Virginia", "Texas", "California",
"Georgia", "North Carolina", "New York",
"Missouri", "Illinois", "Maryland", "Pennsylvania",
"Tennessee", "Colorado", "Washington", "Arizona",
"Ohio", "Wisconsin", "District of Columbia",
"Michigan", "New Jersey", "Utah", "Louisiana",
"Minnesota", "Alabama", "Kansas", "Massachusetts",
"Indiana", "South Carolina", "Arkansas", "Oregon",
"Nevada", "Kentucky", "Nebraska", "(not set)",
"Connecticut", "New Mexico", "Oklahoma",