Skip to content

Instantly share code, notes, and snippets.

@soodoku
soodoku / missing.R
Last active April 22, 2016 11:53
plotting missing
# Load libs
library(ggplot2)
# Simulate correlated data
R = matrix(cbind(1,.80, .80,1), nrow=2)
U = t(chol(R))
@soodoku
soodoku / Liberal Regex Pattern for All URLs
Created July 11, 2016 02:21 — forked from gruber/Liberal Regex Pattern for All URLs
Liberal, Accurate Regex Pattern for Matching All URLs
The regex patterns in this gist are intended to match any URLs,
including "mailto:foo@example.com", "x-whatever://foo", etc. For a
pattern that attempts only to match web URLs (http, https), see:
https://gist.github.com/gruber/8891611
# Single-line version of pattern:
(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))
@soodoku
soodoku / scrape_wisconsin_ads.py
Last active September 14, 2016 13:21
Get text from Wisconsin ad pdfs using pyPdf
'''
Text from Searchable pdfs
Scrape Text off Wisconsin Ads pdfs
Uses pyPdf to get text from searchable pdfs. The script is for tailored for getting data
from Wisconsin Political Ads Database: http://wiscadproject.wisc.edu/Storyboards.
@author: Gaurav Sood
Created on November 02, 2011
@soodoku
soodoku / rent control
Last active October 18, 2016 01:46
Rent Control
# Read the data
sf <- read.csv("sf_tenants.csv")
# Recode
sf$market_rates <- gsub("%", "", sapply(strsplit(sf$market_rate, " / "), "[", 2)) # market_rate
sf$rent_control_rates <- gsub("%", "", sapply(strsplit(sf$rent_control, " / "), "[", 2)) # market_rate
# Ratio of rent_control vs. rent_control
sf$ratio <- as.numeric(sf$rent_control_rates)/as.numeric(sf$market_rates)
@soodoku
soodoku / text_classifier.R
Last active December 15, 2016 17:44
Basic Text Classifier
"
Basic Text Classifier
- Takes a csv with a text column, and column of labels
- Splits into train and test
- Preprocesses text using tm/bag-of-words, 1/2-order Markov
- Uses SVM and Lasso
@author: Gaurav Sood
"
devtools::install_github("abresler/gdeltr2")
devtools::install_github("hafen/trelliscopejs")
library(gdeltr2)
library(dplyr)
asb_ocr <- "Brooklyn Nets"
gkg_codes <-
get_codes_gkg_themes()
imageweb_codes <- get_gdelt_codebook_ft_api(code_book = "imageweb")
@soodoku
soodoku / approval.csv
Created October 28, 2017 21:42
Approval Data on Few Politicians
We can make this file beautiful and searchable if this error is corrected: It looks like row 6 should actually have 38 columns, instead of 4. in line 5.
Last,First,N,Mode,Date Start,Date End,TotApp,TotDis,TotDK,DemApp,DemDis,DemDK,RepApp,RepDis,RepDK,IndApp,IndDis,IndDK,LibApp,LibDis,LibDK,ModApp,ModDis,ModDK,ConApp,ConDis,ConDK,Source,Link,,,,,,,,,
Bachmann,Michelle,928,Phone,07/15/11,07/17/11,29,45,26,16,58,26,45,27,28,28,47,24,15,64,21,17,53,30,50,24,26,PPP,http://www.publicpolicypolling.com/pdf/2011/PPP_Release_National_720925.pdf,,,,,,,,,
Bachmann,Michelle,700,Phone,10/07/11,10/10/11,27,56,17,15,68,17,43,36,21,23,64,13,11,77,12,14,62,23,49,34,17,PPP,http://www.publicpolicypolling.com/pdf/2011/PPP_Release_US_1011513.pdf,,,,,,,,,
Bachmann,Michelle,700,Phone,12/16/11,12/18/11,30,54,16,22,65,13,41,39,19,26,59,16,21,71,8,20,67,12,42,34,23,PPP,http://www.publicpolicypolling.com/pdf/2011/PPP_Release_National_1220925.pdf,,,,,,,,,
Bloomberg,Michael,707,Phone,11/19/10,11/21/10,19,38,44,24,30,46,12,48,40,19,37,44,28,29,43,24,28,48,9,53,39,PPP,http://publicpolicypolling.blogspot.com/2010/11/americans-not-impressed-with-bloomberg.html,,,,,,,,,
Bloomberg,Michael,700,P
@soodoku
soodoku / fun.md
Last active October 30, 2017 18:01
Fun Math

Fun Math

Fun Fact 1

  • 1 - 1 + 1 - 1 + 1 - 1 + .... = 1/2
  • Proof (1) by Luigi Grandi:

S = 1 - 1 + 1 - 1 + 1 - 1 + ...
1 - S = 1 - (1 - 1 + 1 - 1 + 1 - 1 + ...)
= 1 - 1 + 1 - 1 + 1 - 1 + ...

# Output = http://gbytes.gsood.com/2013/11/02/the-fairest-of-them-all/
# Uses cces_recode.R here: https://github.com/soodoku/in-n-out/scripts/cces_recode.R
# Plotting the fairest of all media
library(lattice)
png("fairmedia.png")
dotplot(t(t(table(droplevels(cces06$fairmedia[cces06$fairmedia != "Don't know"]), cces06$pid3[cces06$fairmedia != "Don't know"]))/colSums(table(droplevels(cces06$v2112[cces06$fairmedia != "Don't know"]), cces06$pid3[cces06$fairmedia != "Don't know"]))),
main = "Which network do you think provides the \n fairest coverage of national news?",
State Position Name @Twitter Handle Part Affiliation Boris Shor's Score
AZ Senator David Bradley @Bradley4AZ Democrat -1.253
AZ Senator Katie Hobbs @katiehobbs Democrat -1.684
AZ Senator Ed Ableser @SenatorAbleser Democrat -1.606
AZ Senator Barbara McGuire @SenBarbMcGuire Democrat -0.672
AZ Senator Steve Farley @SteveFarleyAZ Democrat -1.413
AZ Senator Adam Driggs @AdamDriggs Republican 0.738
AZ Senator Bob Worsley @bob_worsley Republican 0.46
AZ Senator Kelli Ward @kelliwardaz Republican 1.144
AZ Senator Nancy Barto @NancyBarto Republican 0.996