Skip to content

Instantly share code, notes, and snippets.

View andrewheiss's full-sized avatar
👨‍💻
#rstats-ing all the things

Andrew Heiss andrewheiss

👨‍💻
#rstats-ing all the things
View GitHub Profile
@andrewheiss
andrewheiss / topic_means_stdized.R
Created March 14, 2014 01:26
topic_means_stdized.R
topic.means.stdized <- structure(list(publication = structure(c(1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L), class = c("ordered", "factor"), .Label = c("Al-Ahram English",
"Daily News Egypt", "Egypt Independent")), topic = structure(c(1L,
1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L,
6L, 7L, 7L, 7L, 8L, 8L, 8L, 9L, 9L, 9L, 10L, 10L, 10L, 11L, 11L,
11L, 12L, 12L, 12L, 13L, 13L, 13L, 14L, 14L, 14L, 15L, 15L, 15L,
16L, 16L, 16L, 17L, 17L, 17L, 18L, 18L, 18L, 19L, 19L, 19L, 20L,
publication topic proportion label
Al-Ahram English X0 0.00147919351749627 Muslim Brotherhood and politics
Daily News Egypt X0 0.00160048693182287 Muslim Brotherhood and politics
Egypt Independent X0 0.00192031955068086 Muslim Brotherhood and politics
Al-Ahram English X1 0.000498077623239853 Morsi and the media
Daily News Egypt X1 0.00143305568570697 Morsi and the media
Egypt Independent X1 0.00306886669105318 Morsi and the media
Al-Ahram English X2 0.00396006588845241 Miscellaneous
Daily News Egypt X2 0.000382149378307192 Miscellaneous
Egypt Independent X2 0.000657784733240396 Miscellaneous
---
generate data:
help: load and process all data from corpora
export articles:
help: export articles from SQLite databases and stem and n-gram them
dependencies:
- ./Python/export_to_mallet.py
- ./Corpora/egypt_independent.db
- ./Corpora/ahram.db
- ./Corpora/dne.db
# Creating all the topic models, graphs, and tables for this project is
# unfortunately a convoluted process, since there are so many moving parts
# (Python 2, Python 3, MALLET, and R). This Makefile automates all those steps
# so you don't have to. Hooray.
# Here's a general outline of the process:
# 1. export_articles: Export articles from SQLite databases into individual
# plain text files (using Python 3) and stem the articles and find
# significant bigrams (using NLTK with Python 2)
# 3. model: Create topic models (using MALLET through R)
@andrewheiss
andrewheiss / visualization.do
Created September 11, 2014 17:32
Basic data visualization
* Super useful site for more complicated plots:
* http://www.survey-design.com.au/Usergraphs.html
* Load data
use "/Users/andrew/Teaching/Data visualization/Practice/export1.dta", clear
*--------------------------------------
*-------------------
* Univariate plots
@andrewheiss
andrewheiss / fancy_logit.do
Created October 21, 2014 18:25
Fancy logit in Stata
*------------------------------------------------
* Logistic regression done well
*
* Andrew Heiss (andrew.heiss@duke.edu)
* October 21, 2014
*------------------------------------------------
* Load data
use "http://www.ats.ucla.edu/stat/data/hsbdemo", clear
@andrewheiss
andrewheiss / mtable_stargazer.R
Created November 4, 2014 15:04
mtable and stargazer
library(memisc)
library(stargazer)
library(pander)
lm0 <- lm(sr ~ pop15 + pop75, data = LifeCycleSavings)
lm1 <- lm(sr ~ dpi + ddpi, data = LifeCycleSavings)
lm2 <- lm(sr ~ pop15 + pop75 + dpi + ddpi, data = LifeCycleSavings)
mtable123 <- mtable("Model 1"=lm0,"Model 2"=lm1,"Model 3"=lm2, summary.stats=c("sigma","R-squared","F","p","N"))
pander(mtable123)
@andrewheiss
andrewheiss / rachel-books-2014.R
Last active August 29, 2015 14:11
Rachel's reading report, 2014
# See report here: http://www.heissatopia.com/2014/12/rachels-2014-reading-report.html
# Libraries
library(dplyr)
library(ggplot2)
library(tidyr)
library(lubridate)
library(stringr)
library(httr)
library(grid)
@andrewheiss
andrewheiss / harry_potter_aggression.R
Created March 24, 2015 15:28
Aggressive characters in Harry Potter
#----------------
# Load packages
#----------------
library(RCurl)
library(dplyr)
library(tidyr)
library(ggplot2)
library(grid)
library(Cairo)
@andrewheiss
andrewheiss / hp_download.py
Last active August 29, 2015 14:17
Scrape Harry Potter
#!/usr/bin/env python3
# --------------
# Load modules
# --------------
from bs4 import BeautifulSoup
from collections import defaultdict
from random import choice
from time import sleep
import urllib.request
import csv