Skip to content

Instantly share code, notes, and snippets.

@soodoku
soodoku / state_abbrev_fips.txt
Created April 12, 2016 21:03
US State Abbreviations to FIPS crosswalk
1 AL
2 AK
4 AZ
5 AR
6 CA
8 CO
9 CT
10 DE
11 DC
12 FL
@soodoku
soodoku / state_various.csv
Last active April 12, 2016 22:31
US State name, 2 letter code, Alphabetical number, Census Region, ICPSR, ICPSR 2
state code num census icpsr icpsr2
Alabama AL 1 South 41 41 AL ALABAMA
Alaska AK 2 West 81 81 AK ALASKA
Arizona AZ 3 West 61 61 AZ ARIZONA
Arkansas AR 4 South 42 42 AR ARKANSAS
California CA 5 West 71 71 CA CALIFORNIA
Colorado CO 6 West 62 62 CO COLORADO
Connecticut CT 7 Northeast 1 01 CT CONNECTICUT
Delaware DE 8 South 11 11 DE DELAWARE
District of Columbia DC 9 Northeast 55 55 DC DISTRICT OF COLUMBIA
@soodoku
soodoku / missing.R
Last active April 22, 2016 11:53
plotting missing
# Load libs
library(ggplot2)
# Simulate correlated data
R = matrix(cbind(1,.80, .80,1), nrow=2)
U = t(chol(R))
@soodoku
soodoku / scrape_wisconsin_ads.py
Last active September 14, 2016 13:21
Get text from Wisconsin ad pdfs using pyPdf
'''
Text from Searchable pdfs
Scrape Text off Wisconsin Ads pdfs
Uses pyPdf to get text from searchable pdfs. The script is for tailored for getting data
from Wisconsin Political Ads Database: http://wiscadproject.wisc.edu/Storyboards.
@author: Gaurav Sood
Created on November 02, 2011
@soodoku
soodoku / rent control
Last active October 18, 2016 01:46
Rent Control
# Read the data
sf <- read.csv("sf_tenants.csv")
# Recode
sf$market_rates <- gsub("%", "", sapply(strsplit(sf$market_rate, " / "), "[", 2)) # market_rate
sf$rent_control_rates <- gsub("%", "", sapply(strsplit(sf$rent_control, " / "), "[", 2)) # market_rate
# Ratio of rent_control vs. rent_control
sf$ratio <- as.numeric(sf$rent_control_rates)/as.numeric(sf$market_rates)
@soodoku
soodoku / text_classifier.R
Last active December 15, 2016 17:44
Basic Text Classifier
"
Basic Text Classifier
- Takes a csv with a text column, and column of labels
- Splits into train and test
- Preprocesses text using tm/bag-of-words, 1/2-order Markov
- Uses SVM and Lasso
@author: Gaurav Sood
"
@johnmyleswhite
johnmyleswhite / median.R
Created February 23, 2017 00:40
R's Medians as a Rabbit Hole of Type Promotions and Function Indirection
> median(FALSE)
[1] FALSE
> median(c(TRUE, FALSE))
[1] 0.5
> median(c(TRUE, FALSE, TRUE))
[1] TRUE
> f <- factor(c('a', 'b', 'c'), levels = c('a', 'b', 'c'), ordered = TRUE)
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@bearloga
bearloga / classify.R
Created September 10, 2015 19:27
A function which launches a Shiny app for hand coding (manually classifying) data.
#' Manual classification of observations
#'
#' \code{classify} launches a Shiny app to manually classify a subset of observations.
#'
#' @param x A character vector.
#' @param btn_labels A character vector of length 2 corresponding to 0 and 1.
#' @return A vector of 0/1 for each element in \code{x}.
#' @export
#' @examples \dontrun{
#' foo <- sprintf('%s (%.2f miles per gallon)', rownames(mtcars), mtcars$mpg)
@devin-petersohn
devin-petersohn / a_pandas_on_ray_blogpost_01.ipynb
Last active October 14, 2018 19:14
Pandas on Ray - Lessons learned Blog Post. Also introduces Modin, a project for unifying the APIs of computing engines.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.