Skip to content

Instantly share code, notes, and snippets.

Avatar

Alex Hanna alexhanna

View GitHub Profile
@alexhanna
alexhanna / launch-cliff-gcp.sh
Created Nov 9, 2020
Code to get CLIFF working on a GCP instance after installing Tomcat8 using GCP Deployment Manager
View launch-cliff-gcp.sh
#!/bin/sh
## This is copy-pasta from the original Medialab script with some mods
## https://raw.githubusercontent.com/mediacloud/cliff-docker/master/launch.sh
echo "Getting CLIFF..."
echo " downloading Cliff WAR file from GitHub"
wget https://github.com/mitmedialab/CLIFF/releases/download/v2.6.1/cliff-2.6.1.war
sudo mv cliff-2.6.1.war /var/lib/tomcat8/webapps/
echo " done (copied /var/lib/tomcat8/webapps/)"
@alexhanna
alexhanna / sample2013.sql
Created Oct 28, 2017
Sample Hive example
View sample2013.sql
insert overwrite local directory '/scratch.1/sample2013_1'
row format delimited
fields terminated by "\t"
select id_str, created_at, regexp_replace(text, "[ \t\r\n]+", " "), user.id_str, regexp_replace(user.name, "[ \t\r\n]+", " "), user.screen_name, retweeted_status.id_str, retweeted_status.created_at, regexp_replace(retweeted_status.text, "[ \t\r\n]+", " "), retweeted_status.user.id_str, regexp_replace(retweeted_status.user.name, "[ \t\r\n]+", " "), retweeted_status.user.screen_name
from gh_rc TABLESAMPLE (10 PERCENT)
WHERE year = 2013 and month = 1;
insert overwrite local directory '/scratch.1/sample2013_2'
row format delimited
fields terminated by "\t"
@alexhanna
alexhanna / social-science-programming.md
Last active Aug 11, 2022
Notes on social science programming principles
View mod_split_proquest.py
#!/usr/bin/env python
# encoding: utf-8
"""
Module for parsing Proquest data.
Only tested on limited bits of the Proquest Ethnic Newswire.
Based loosely off a script by Neal Caren (neal.caren@unc.edu)
Alex Hanna, alex.hanna@gmail.com
2017-05-04
"""
@alexhanna
alexhanna / CallForTACCT490.md
Last active Aug 10, 2016
Call for TA: CCT 490 (Social Data Analytics)
View CallForTACCT490.md

Call for TA: CCT490 (Social Data Analytics)

The Institute of Communication, Culture, Information and Technology at the University of Toronto Mississauga is looking for a teaching assistant for CCT 490 -- Social Data Analytics -- for Fall 2016, taught by Professor Alex Hanna. The course will cover basics of data collection, processing, and analysis for social trace data, such as Twitter and Facebook messages.

The position is for 40 hours a week for the Fall 2016 term, and will involve grading assignments, assisting in labs, and invigilating exams. The position is represented by CUPE 3902, Unit 3.

Applicants must have proficency in the Python programming language. Knowledge of other programming languages is a plus but not required. Experience with analysis of social media data is preferred. Applicants must live in the Toronto area and be able to travel to the Mississauga campus at least once a week.

To apply, please send a resume or CV to alex.hanna@utoronto.ca, with a short cover letter. The deadline fo

@alexhanna
alexhanna / 20_newsgroups.R
Last active Nov 17, 2017
20 newsgroups classification with R
View 20_newsgroups.R
## FILE: Classifying 20 Newsgroups Dataset
## For presentation with Computational Sociology source at Duke.
## AUTHOR: Alex Hanna (ahanna@ssc.wisc.edu)
## DATE: October 14, 2015
## load the RTextTools package
## Documentation of this package is available at
## https://cran.r-project.org/web/packages/RTextTools/RTextTools.pdf
library(RTextTools)
@alexhanna
alexhanna / split_ln.py
Last active Feb 19, 2020
Script for splitting Lexis-Nexis files. Adapted from an original from Neal Caren.
View split_ln.py
#!/usr/bin/env python
# encoding: utf-8
"""
split_ln.py
Created by Neal Caren on 2012-05-14.
neal.caren@unc.edu
Edited by Alex Hanna on 2015-01-29
alex.hanna@gmail.com
View REU Ad - 2015-01-22
National Science Foundation Research Experience for Undergraduates (REU)
“Constructing and Validating an Automated Coding System for Protest Events in Electronic News Sources.”
Principal Investigators: Pamela Oliver, Professor, oliver@ssc.wisc.edu, Chaeyoon Lim, Associate Professor,
clim@ssc.wisc.edu, Alex Hanna (grad student).
This opportunity is for undergraduates interested in social science or media studies to provide research assistance
on a part-time basis during the spring 2015 semester. Depending on schedules and the flow of work, there may be opportunities to
continue during the summer. REU participants will be paid a stipend of $100 a week with an expectation of 10
hours a week of research assistance, meeting attendance, and background reading. Depending on student needs
and interest, we will consider students who wish to work 5-15 hours a week on the project, with proportional
View sentimentClassifier.py
from __future__ import division
import csv, logging, math, os.path
import pickle, random, re, string
import datetime, time
import numpy as np
import pandas as pd
import scipy as sp
## metrics
View gradient-color.R
p <- ggplot(df.p, aes(x=Margin, y=factor(variable), fill = Class, alpha = value))
p <- p + theme_bw() + geom_tile(color = NA, width = 0.005) + scale_fill_manual(values = wes.palette(2, "Royal1"), labels = c("False Positives", "True Positives"))
p <- p + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())
p <- p + theme(axis.text.y = element_text(size = 7)) + ylab("Feature")
ggsave(p, file = "../img/linearsvc_no-fs_top100_fp-v-tp_20140916.png", width = 16, height = 9)