Skip to content

Instantly share code, notes, and snippets.

@jsonbecker
jsonbecker / cleanURL.py
Created February 19, 2014 15:40
A quick way to clean up an URL, removing common tracking and getting around proxy links to get the proper permanent link.
#!/usr/bin/python
import requests
import sys
from re import search
from subprocess import check_output
url = check_output('pbpaste')
r = requests.get(url)
@jsonbecker
jsonbecker / gist:10a7210a10292b208d91
Last active August 29, 2015 14:22
providencefirefighters.R
install.packages('rvest')
library(rvest)
library(dplyr)
page <- html('http://app.providencejournal.com/topics/special-reports/tables/firefigher-salaries-fy2014/firefighter-salaries.htm')
data <- page %>%
html_node('table') %>%
html_table() %>%
.[c(-1,-2,-3),] %>%
as.data.frame
## Current pattern
function(myList){
do_stuff_with(mylist$attribute1, myList$attribute2)
do_stuff_with(mylist$attribute3, myList$attribute4)
}
## Desired pattern
function(myList){
do_stuff_with(attribute1, attribute2)
do_stuff_with(attribute4, attribute4)
@jsonbecker
jsonbecker / WPFootnotesToMarkdown.py
Last active December 9, 2015 23:08
WP-Footnotes to Markdown footnotes.
from sys import argv
import re
name, file_path = argv
p = re.compile(r"[\s]\(\((.*?[)]{0,1})\)\)[\s]{0,1}")
# The tricky part here is to match all text between (()), including as many as
# one set of (), which may even terminate ))). The {0,1} captures as many as
# one ). The trailing space is there because I often surrounded the (()) with
# a space to make it clear in the Wordpress editor.
@jsonbecker
jsonbecker / arghslow.R
Created January 29, 2013 19:24
If this doesn't motivate you to learn data.table...
modal_person_attribute <- function(df, attribute){
# df: rbind of all person tables from all years
# attribute: vector name to calculate the modal value
# Calculate the number of instances an attributed is associated with an id
mode <- do.call(rbind,
tapply(as.character(df[[attribute]]), df$sasid,
function(x) data.frame(attribute=rle(x)$values,
counts=rle(x)$lengths)))
names(mode) <- c(as.character(attribute), 'counts')
# Clean up
@jsonbecker
jsonbecker / modalSDPdt.R
Created January 31, 2013 01:46
SDP business rules to resolve student attributes.
modal_person_attribute <- function(df, attribute){
# df: rbind of all person tables from all years
# attribute: vector name to calculate the modal value
# Calculate the number of instances an attributed is associated with an id
dt <- data.table(df)
mode <- dt[, rle(as.character(.SD[[attribute]])), by=sasid]
setnames(mode, c('sasid', 'counts', as.character(attribute)))
setkeyv(mode, c('sasid', 'counts'))
# Only include attributes with the maximum values. This is equivalent to the
# mode with two records when there is a tie.
@jsonbecker
jsonbecker / cutoff_matrix.R
Created October 17, 2013 19:17
Quick function to calculate common statistics when evaluating a predictor of binary outcomes.
cutoff_matrix <- function(df, classifier, outcome, breaks=seq(1,0,-.01)){
results <- data.frame(cutoff=vector(mode='numeric', length=length(breaks)),
true_pos=vector(mode='numeric', length=length(breaks)),
true_neg=vector(mode='numeric', length=length(breaks)),
false_pos=vector(mode='numeric', length=length(breaks)),
false_neg=vector(mode='numeric', length=length(breaks)))
for(i in seq(1, length(breaks))){
value <- breaks[i]
j <- data.frame(table(df[[classifier]]>value, df[[outcome]]))
results[i,1] <- value
@jsonbecker
jsonbecker / link_list_templates.html
Last active December 28, 2015 17:59
How to modify your `articles.html` and `index.html` template for "Linked List" style posts.
@jsonbecker
jsonbecker / photoDate.py
Last active January 2, 2016 00:38
A few versions of getting date data from photos.
import sys
import os, shutil, time
import subprocess
import os.path
import exifread
from datetime import datetime
def photoDateSIPS(f):
"Return the date/time on which the given photo was taken."
@jsonbecker
jsonbecker / timing.R
Created September 19, 2016 20:00
A small bit of code to log how long something took
# Ever want to log how long someting takes? I know I do. timing()
# isn't about benchmarking, it's about getting feedback from long running
# tasks, especially if they are scheduled in production. Anyway, I love
# this function because it's a simple wrapper that can go around anything.
# I use it inside our tools at Allovue when extracting data to get things
# like this:
#> accounts <- extract_data(config_file$accounts)
# Starting queries/accounts.sql at: Mon Sep 19 15:27:34 2016
# Completed at: Mon Sep 19 15:28:14 2016