Skip to content

Instantly share code, notes, and snippets.

Some things journalists may want to consider:
1. Anecdotes can mislead. People seeing another yet another episodic story on crime may infer that crime is increasing.
So report numbers where trustworthy numerical data are available.
2. But numbers need to be reported carefully. Most people, when reading news, do not do back of the envelope calculations to interpret data correctly.
So ill-reported numbers can mislead.
3. Rules for numbers:
a. % changes than changes in %. The former is more impressive when the base rate is low. Latter generally a better way to report things. If confused, report t1 and t2.
@soodoku
soodoku / Hillary_Clinton
Last active August 29, 2015 14:17
Calculating Hillary's Missing Emails
Note:
55000/(365*4) ~ 37.7. That seems a touch low for Sec. of state.
Caveats:
1. Clinton may have used more than one private server
2. Clinton may have sent emails from other servers to unofficial accounts of other state department employees
Lower bound for missing emails from Clinton:
Take a small weighted random sample (weighting seniority more) of top state department employees.
@soodoku
soodoku / capitol_speech.py
Last active August 29, 2015 14:17
Get Congressional Speech Data Via CapitolWords API
'''
Gets Congressional speech text, arranged by speaker.
Produces a csv (capitolwords.csv) with the following columns:
speaker_state,speaker_raw,speaker_first,congress,title,origin_url,number,id,volume,chamber,session,speaker_last,
pages,speaker_party,date,bills,bioguide_id,order,speaking,capitolwords_url
Uses the Sunlight foundation library: http://python-sunlight.readthedocs.org/en/latest/
'''
@soodoku
soodoku / salvage_csv.py
Last active August 29, 2015 14:20
Salvage Corrupted CSV
'''
What does it do?
Goes through a corrupted csv sequentially and outputs rows that are clean.
Also outputs, total n, total corrupted n
@author: Gaurav Sood
Run: python salvage_csv.py input_csv output_csv
'''
@soodoku
soodoku / prop_weights.R
Created May 31, 2015 22:52
Weighting datasets by propensity scores (~YouGov Method for Sampling)
"
Weighting by Propensity Scores
Last Edited: 5/31/2015
Task Outline:
1. Two datasets:
dataset 1: large pop. representative sample
dataset 2: convenient sample
2. Create weights for dataset 2 so that its marginals are close to dataset 1 on some vars.
@soodoku
soodoku / server_installs
Last active August 30, 2015 23:40
Basic R related installs for Initializing Scrapers on Digital Ocean Ubuntu
apt-get upgrade
apt-get update
sudo aptitude install emacs24
sudo aptitude install r-base
sudo aptitude install libcurl4-openssl-dev
sudo aptitude install libxml2-dev
apt-get install openjdk-7-*
R CMD javareconf -e
@tslumley
tslumley / redpeak.R
Last active September 9, 2015 14:02
redpeak=function(s,w){
x=c(0,0,1,NA, 1,2,2,NA,0.5,1,1.5)
y=c(0,1,1,NA,1,1,0,NA,0,0.5,0)
polygon(x*w+s[1],y*w+s[2],col=c("black","navyblue","#98332f"))
}
@soodoku
soodoku / basic_sentiment_analysis.py
Last active November 14, 2015 05:51
Basic sentiment analysis with AFINN or custom word database
'''
Basic Sentiment Analysis
Builds on:
https://finnaarupnielsen.wordpress.com/2011/06/20/simplest-sentiment-analysis-in-python-with-af/
Utilizes AFINN or a custom sentiment db
Example Snippets at end from: https://code.google.com/p/sentana/wiki/ExampleSentiments
'''
@soodoku
soodoku / cong.csv
Last active November 22, 2015 20:29
Educational Qualifications of Members of the 111th Congress
Name District Education Science Law
Jeff Sessions (R) AL-Senate B.A., Huntingdon College; J.D. University of Alabama School of Law 1
Richard Shelby (R) AL-Senate B.A., University of Alabama; J.D. University of Alabama School of Law 1
Jo Bonner (R) AL-1 B.A. Journalism, University of Alabama 0
Bobby Bright (D) AL-2 B.A. Political Science, Auburn University; M.S. Criminal Justice, Troy State University; J.D. Thomas Goode Jones School of Law 1
Mike Rogers (R) AL-3 B.A., Political Science; M.P.A., Jackson State University; J.D. Birmingham School of Law 1
Robert Aderholt (R) AL-4 B.A., Political Science/History, Birmingham Southern College; J.D., Samford University 1
Partker Griffith (D) AL-5 B.S.; M.D., Louisiana State University 0
Spencer Bachus (R) AL-6 B.A., Auburn University; J.D., University of Alabama 1
Artur Davis (D) AL-7 B.A., Government, Harvard University; J.D., Harvard University School of Law 1
@carlislerainey
carlislerainey / mass-shootings.R
Last active December 4, 2015 12:51
Downloads and plots data on mass shootings from Mother Jones (http://www.motherjones.com/politics/2012/12/mass-shootings-mother-jones-full-data)
# load packages
library(magrittr)
library(googlesheets)
library(lubridate)
library(dplyr)
library(stringr)
library(tidyr)
library(ggplot2)
library(maps)