Skip to content

Instantly share code, notes, and snippets.

View tp_trelli.r
devtools::install_github("abresler/gdeltr2")
devtools::install_github("hafen/trelliscopejs")
library(gdeltr2)
library(dplyr)
asb_ocr <- "Brooklyn Nets"
gkg_codes <-
get_codes_gkg_themes()
imageweb_codes <- get_gdelt_codebook_ft_api(code_book = "imageweb")
@soodoku
soodoku / rent control
Last active Oct 18, 2016
Rent Control
View rent control
# Read the data
sf <- read.csv("sf_tenants.csv")
# Recode
sf$market_rates <- gsub("%", "", sapply(strsplit(sf$market_rate, " / "), "[", 2)) # market_rate
sf$rent_control_rates <- gsub("%", "", sapply(strsplit(sf$rent_control, " / "), "[", 2)) # market_rate
# Ratio of rent_control vs. rent_control
sf$ratio <- as.numeric(sf$rent_control_rates)/as.numeric(sf$market_rates)
@soodoku
soodoku / Liberal Regex Pattern for All URLs
Created Jul 11, 2016 — forked from gruber/Liberal Regex Pattern for All URLs
Liberal, Accurate Regex Pattern for Matching All URLs
View Liberal Regex Pattern for All URLs
The regex patterns in this gist are intended to match any URLs,
including "mailto:foo@example.com", "x-whatever://foo", etc. For a
pattern that attempts only to match web URLs (http, https), see:
https://gist.github.com/gruber/8891611
# Single-line version of pattern:
(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))
@soodoku
soodoku / missing.R
Last active Apr 22, 2016
plotting missing
View missing.R
# Load libs
library(ggplot2)
# Simulate correlated data
R = matrix(cbind(1,.80, .80,1), nrow=2)
U = t(chol(R))
@soodoku
soodoku / state_various.csv
Last active Apr 12, 2016
US State name, 2 letter code, Alphabetical number, Census Region, ICPSR, ICPSR 2
View state_various.csv
state code num census icpsr icpsr2
Alabama AL 1 South 41 41 AL ALABAMA
Alaska AK 2 West 81 81 AK ALASKA
Arizona AZ 3 West 61 61 AZ ARIZONA
Arkansas AR 4 South 42 42 AR ARKANSAS
California CA 5 West 71 71 CA CALIFORNIA
Colorado CO 6 West 62 62 CO COLORADO
Connecticut CT 7 Northeast 1 01 CT CONNECTICUT
Delaware DE 8 South 11 11 DE DELAWARE
District of Columbia DC 9 Northeast 55 55 DC DISTRICT OF COLUMBIA
@soodoku
soodoku / state_abbrev_fips.txt
Created Apr 12, 2016
US State Abbreviations to FIPS crosswalk
View state_abbrev_fips.txt
1 AL
2 AK
4 AZ
5 AR
6 CA
8 CO
9 CT
10 DE
11 DC
12 FL
@soodoku
soodoku / cong.csv
Last active Nov 22, 2015
Educational Qualifications of Members of the 111th Congress
View cong.csv
Name District Education Science Law
Jeff Sessions (R) AL-Senate B.A., Huntingdon College; J.D. University of Alabama School of Law 1
Richard Shelby (R) AL-Senate B.A., University of Alabama; J.D. University of Alabama School of Law 1
Jo Bonner (R) AL-1 B.A. Journalism, University of Alabama 0
Bobby Bright (D) AL-2 B.A. Political Science, Auburn University; M.S. Criminal Justice, Troy State University; J.D. Thomas Goode Jones School of Law 1
Mike Rogers (R) AL-3 B.A., Political Science; M.P.A., Jackson State University; J.D. Birmingham School of Law 1
Robert Aderholt (R) AL-4 B.A., Political Science/History, Birmingham Southern College; J.D., Samford University 1
Partker Griffith (D) AL-5 B.S.; M.D., Louisiana State University 0
Spencer Bachus (R) AL-6 B.A., Auburn University; J.D., University of Alabama 1
Artur Davis (D) AL-7 B.A., Government, Harvard University; J.D., Harvard University School of Law 1
@soodoku
soodoku / count.md
Last active Sep 20, 2015
Making it Count: Counting Women On the Street
View count.md

Making'em Count: Counting Women On the Street

Proposal for a crowd-sourced study:

The purpose: to estimate the proportion of males in the people on the streets.

Some priors: the proportion varies by time of the day, and by place. Proportion of women out on the city's streets likely declines at night — and tragic as reasons for that are, it is likely that proportion of men is greater around office complexes than on residential streets. The aim is to get data from a diverse set of places and from a range of times.

@soodoku
soodoku / server_installs
Last active Aug 30, 2015
Basic R related installs for Initializing Scrapers on Digital Ocean Ubuntu
View server_installs
apt-get upgrade
apt-get update
sudo aptitude install emacs24
sudo aptitude install r-base
sudo aptitude install libcurl4-openssl-dev
sudo aptitude install libxml2-dev
apt-get install openjdk-7-*
R CMD javareconf -e
@soodoku
soodoku / Distributed Coding.md
Last active Aug 29, 2015
Reducing Costs for Producing Training Data and Implementing Semi-Automated Systems
View Distributed Coding.md

The goal is to make it easier to produce distributed Human Intelligence Tasks (HIT, nomenclature courtesy Amazon). HITs include production of training data, general class of recognition problems such as image recognition tasks that humans can do with very little error and which machines are still somewhat bad at, surveys (where the source of data in the human being surveyed) etc.

The general idea traces its ancestry to CAPTCHA, which was developed to solve two problems at the same time -- provide a way to websites to distinguish between humans and bots, and help OCR written (or heard) material. But it differs from CAPTCHA in three ways. First, our goal is to not try to solve two problems at once. Thus, instead of current CAPTCHA systems, which make it as hard as possible for humans to get the answer right, we want to invert that logic -- make it as easy for humans to get the answer right. Second, we want to build it for tasks other than recognition tasks. Third, we plan to attach it to a payment architectu