Skip to content

Instantly share code, notes, and snippets.

Andy Halterman ahalterman

View GitHub Profile
ahalterman /
Created Mar 13, 2018
Event Data in 30 Lines of Python
import spacy
nlp = spacy.load("en_core_web_lg")
with open("scraped.json", "r") as f:
news = json.load(f)
news = [i['body'] for i in news]
processed_docs = list(nlp.pipe(news))
verb_list = ["launch", "begin", "initiate", "start"]
dobj_list = ["attack", "offensive", "operation", "assault"]
ahalterman /
Created Mar 4, 2018
Managing machine learning experiments
# many lines omitted above
def make_log(experiment_dir, X_train, X_test, Y_test, model, hist, custom_model):
now =
now = now.strftime("%Y-%m-%d %H:%M:%S")
# get last commit hash
commit = subprocess.check_output(['git', 'rev-parse', 'HEAD']).strip()
# get precision and recall at a range of cutpoints
cutoffs = [0.01, 0.05, 0.10, 0.20, 0.30, 0.40, 0.50, 0.60]
precrecs = [precision_recall(X_test, Y_test, model, i) for i in cutoffs]
ahalterman /
Created Apr 30, 2017
DW scraper for event data tutorial
from __future__ import unicode_literals
from bs4 import BeautifulSoup
import requests
import json
import re
import datetime
from pymongo import MongoClient
connection = MongoClient()
View gist:7717afde4f391fc2e99c
### Keybase proof
I hereby claim:
* I am ahalterman on github.
* I am ahalt ( on keybase.
* I have a public key whose fingerprint is 5CEE CCB1 B548 682B B988 B999 952E E2B4 950D A417
To claim this, I am signing this object:
ahalterman / Brazil-GKG.Rmd
Last active Apr 22, 2020
R Markdown for Brazilian Protest Themes in GKG Post
View Brazil-GKG.Rmd
Brazilian Protest Themes in the Global Knowledge Graph
Andrew Halterman
Caerus Analytics
January 10, 2014
The Global Knowledge Graph, in the [words of Kalev Leetaru](, aims to "connect every person, organization, location, count, theme, news source, and event across the planet into a single massive network that captures what’s happening around the world, what its context is and who’s involved, and how the world is feeling about it, every single day." Because GKG takes the form of a network with entities and themes as nodes and co-mentions as edges, the obvious way to work with it is as a network graph using tools from social network analysis. Kalev's [work on Iran]( shows the remarkable ability of automated community detection algorithms to cluster people according to th
ahalterman / gdelt.ortho.r
Created Aug 30, 2013
Script for reproducing map of August 29, 2013 GDELT coverage. Orthogonal map projection, centered on Cairo.
View gdelt.ortho.r
# Author: Andrew Halterman. 30 August 2013
# R script for reproducing map of August 29, 2013 GDELT coverage
# Orthogonal map projection, centered on Cairo
# This assumes that you have your GDELT data stored in a SQLite database.
# For instructions on setting up SQLite and dplyr, see
ahalterman / subset.domestic.r
Last active Dec 18, 2015
Subsetting GDELT for domestic events using R. I'm looking at domestic activities coded by GDELT, including protests. This is my walkthrough of how I subset only events occuring inside Georgia between 1979 and 2012 in the GDELT reduced dataset. 1. if you just use the python script to subset the full (reduced) dataset, you end up with only events …
View subset.domestic.r
# Example for subsetting domestic events in Georgia from the GDELT reduced dataset.
# Read in the python output file.
GEO.ALL <- read.table("./R/GDELT/",sep="\t", header=TRUE)
# The header=T command didn't work, so fix that:
names(GEO.ALL) <- c("Day","Actor1Code","Actor2Code","EventCode","QuadCategory","GoldsteinScale",
# To keep our subsetting function manageable, prep the GEO.ALL dataframe by substringing the first
You can’t perform that action at this time.