Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

View alexhanna's full-sized avatar

Alex Hanna alexhanna

View GitHub Profile
@alexhanna
alexhanna / social-science-programming.md
Last active March 14, 2024 11:05
Notes on social science programming principles
  1. Code and Data for the Social Sciences: A Practitioner’s Guide, Gentzkow and Shapiro.
  2. Good enough practices in scientific computing, Wilson et al.
  3. Best Practices for Scientific Computing, Wilson et al.
  4. Principled Data Processing, Patrick Ball.
  5. The Plain Person’s Guide to Plain Text Social Science, Healy.
  6. Avoiding technical debt in social science research, Toor.
@alexhanna
alexhanna / launch-cliff-gcp.sh
Created November 9, 2020 01:37
Code to get CLIFF working on a GCP instance after installing Tomcat8 using GCP Deployment Manager
#!/bin/sh
## This is copy-pasta from the original Medialab script with some mods
## https://raw.githubusercontent.com/mediacloud/cliff-docker/master/launch.sh
echo "Getting CLIFF..."
echo " downloading Cliff WAR file from GitHub"
wget https://github.com/mitmedialab/CLIFF/releases/download/v2.6.1/cliff-2.6.1.war
sudo mv cliff-2.6.1.war /var/lib/tomcat8/webapps/
echo " done (copied /var/lib/tomcat8/webapps/)"
@alexhanna
alexhanna / split_ln.py
Last active February 19, 2020 03:43
Script for splitting Lexis-Nexis files. Adapted from an original from Neal Caren.
#!/usr/bin/env python
# encoding: utf-8
"""
split_ln.py
Created by Neal Caren on 2012-05-14.
neal.caren@unc.edu
Edited by Alex Hanna on 2015-01-29
alex.hanna@gmail.com
@alexhanna
alexhanna / 20_newsgroups.R
Last active November 17, 2017 20:26
20 newsgroups classification with R
## FILE: Classifying 20 Newsgroups Dataset
## For presentation with Computational Sociology source at Duke.
## AUTHOR: Alex Hanna (ahanna@ssc.wisc.edu)
## DATE: October 14, 2015
## load the RTextTools package
## Documentation of this package is available at
## https://cran.r-project.org/web/packages/RTextTools/RTextTools.pdf
library(RTextTools)
@alexhanna
alexhanna / sample2013.sql
Created October 28, 2017 14:30
Sample Hive example
insert overwrite local directory '/scratch.1/sample2013_1'
row format delimited
fields terminated by "\t"
select id_str, created_at, regexp_replace(text, "[ \t\r\n]+", " "), user.id_str, regexp_replace(user.name, "[ \t\r\n]+", " "), user.screen_name, retweeted_status.id_str, retweeted_status.created_at, regexp_replace(retweeted_status.text, "[ \t\r\n]+", " "), retweeted_status.user.id_str, regexp_replace(retweeted_status.user.name, "[ \t\r\n]+", " "), retweeted_status.user.screen_name
from gh_rc TABLESAMPLE (10 PERCENT)
WHERE year = 2013 and month = 1;
insert overwrite local directory '/scratch.1/sample2013_2'
row format delimited
fields terminated by "\t"
@alexhanna
alexhanna / schema.sql
Last active August 29, 2017 07:02
Creating Twitter Hive schema.
SET hive.exec.compress.output=true;
SET mapred.max.split.size=256000000;
SET mapred.output.compression.type=BLOCK;
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.dynamic.partition=true;
CREATE EXTERNAL TABLE gh_raw (
id BIGINT,
created_at STRING,
#!/usr/bin/env python
# encoding: utf-8
"""
Module for parsing Proquest data.
Only tested on limited bits of the Proquest Ethnic Newswire.
Based loosely off a script by Neal Caren (neal.caren@unc.edu)
Alex Hanna, alex.hanna@gmail.com
2017-05-04
"""
@alexhanna
alexhanna / CallForTACCT490.md
Last active August 10, 2016 19:06
Call for TA: CCT 490 (Social Data Analytics)

Call for TA: CCT490 (Social Data Analytics)

The Institute of Communication, Culture, Information and Technology at the University of Toronto Mississauga is looking for a teaching assistant for CCT 490 -- Social Data Analytics -- for Fall 2016, taught by Professor Alex Hanna. The course will cover basics of data collection, processing, and analysis for social trace data, such as Twitter and Facebook messages.

The position is for 40 hours a week for the Fall 2016 term, and will involve grading assignments, assisting in labs, and invigilating exams. The position is represented by CUPE 3902, Unit 3.

Applicants must have proficency in the Python programming language. Knowledge of other programming languages is a plus but not required. Experience with analysis of social media data is preferred. Applicants must live in the Toronto area and be able to travel to the Mississauga campus at least once a week.

To apply, please send a resume or CV to alex.hanna@utoronto.ca, with a short cover letter. The deadline fo

@alexhanna
alexhanna / rupaulModelFit.R
Last active December 15, 2015 09:19
Model fit with residuals
t.cox2_ph <- coxph(t.surv ~ (Age + PlusSize + PuertoRico + Wins + Highs + Lows + Lipsyncs + CompLeft +
Wins*CompLeft + Highs*CompLeft + Lows*CompLeft + Lipsyncs*CompLeft) + cluster(ID), df)
t.cox3s <- coxph(t.surv ~ (Age + PlusSize + PuertoRico + Wins + Highs + Lows + LipsyncWithoutOut + CompLeft) + cluster(ID), df)
model.df <- data.frame(ID = integer(0), Residuals = double(0), Model = character(0))
model.list <- list(c2 = t.cox2, c2ph = t.cox2_ph, c3 = t.cox3, c3s = t.cox3s)
for (i in 1:length(model.list)) {
name <- names(model.list[i])
cMod <- model.list[[i]]
@alexhanna
alexhanna / polClassify.R
Created March 21, 2013 00:43
Political classifier, largely adapted from Machine Learning for Hackers.
# File-Name: polClassify.R
# Edited: 2013-03-20
# Orig.Author: Drew Conway (drew.conway@nyu.edu)
#
# Packages Used: tm, ggplot2
#
# All source code is copyright (c) 2012, under the Simplified BSD License.
# For more information on FreeBSD see: http://www.opensource.org/licenses/bsd-license.php