Skip to content

Instantly share code, notes, and snippets.

View ahalterman's full-sized avatar

Andy Halterman ahalterman

View GitHub Profile
@ahalterman
ahalterman / spacy_events.py
Created March 13, 2018 20:21
Event Data in 30 Lines of Python
import spacy
nlp = spacy.load("en_core_web_lg")
with open("scraped.json", "r") as f:
news = json.load(f)
news = [i['body'] for i in news]
processed_docs = list(nlp.pipe(news))
verb_list = ["launch", "begin", "initiate", "start"]
dobj_list = ["attack", "offensive", "operation", "assault"]
@ahalterman
ahalterman / Brazil-GKG.Rmd
Last active April 22, 2020 01:27
R Markdown for Brazilian Protest Themes in GKG Post
Brazilian Protest Themes in the Global Knowledge Graph
========================================================
Andrew Halterman
Caerus Analytics
January 10, 2014
The Global Knowledge Graph, in the [words of Kalev Leetaru](http://gdeltblog.wordpress.com/2013/10/27/announcing-the-debut-of-the-gdelt-global-knowledge-graph/), aims to "connect every person, organization, location, count, theme, news source, and event across the planet into a single massive network that captures what’s happening around the world, what its context is and who’s involved, and how the world is feeling about it, every single day." Because GKG takes the form of a network with entities and themes as nodes and co-mentions as edges, the obvious way to work with it is as a network graph using tools from social network analysis. Kalev's [work on Iran](http://www.foreignpolicy.com/articles/2013/11/26/the_tehran_connection_big_data_iran) shows the remarkable ability of automated community detection algorithms to cluster people according to th
@ahalterman
ahalterman / dw_scraper.py
Created April 30, 2017 22:47
DW scraper for event data tutorial
from __future__ import unicode_literals
from bs4 import BeautifulSoup
import requests
import json
import re
import datetime
from pymongo import MongoClient
connection = MongoClient()
@ahalterman
ahalterman / event_model_snippet.py
Created March 4, 2018 14:32
Managing machine learning experiments
# many lines omitted above
def make_log(experiment_dir, X_train, X_test, Y_test, model, hist, custom_model):
now = datetime.datetime.now()
now = now.strftime("%Y-%m-%d %H:%M:%S")
# get last commit hash
commit = subprocess.check_output(['git', 'rev-parse', 'HEAD']).strip()
# get precision and recall at a range of cutpoints
cutoffs = [0.01, 0.05, 0.10, 0.20, 0.30, 0.40, 0.50, 0.60]
precrecs = [precision_recall(X_test, Y_test, model, i) for i in cutoffs]
@ahalterman
ahalterman / gdelt.ortho.r
Created August 30, 2013 20:14
Script for reproducing map of August 29, 2013 GDELT coverage. Orthogonal map projection, centered on Cairo.
# Author: Andrew Halterman. 30 August 2013
# R script for reproducing map of August 29, 2013 GDELT coverage
# Orthogonal map projection, centered on Cairo
# This assumes that you have your GDELT data stored in a SQLite database.
# For instructions on setting up SQLite and dplyr, see http://gdeltblog.wordpress.com/2013/08/29/subsetting-and-aggregating-gdelt-using-dplyr-and-sqlite/
library(dplyr)
library(RSQLite)
library(RSQLite.extfuns)
@ahalterman
ahalterman / subset.domestic.r
Last active December 18, 2015 15:48
Subsetting GDELT for domestic events using R. I'm looking at domestic activities coded by GDELT, including protests. This is my walkthrough of how I subset only events occuring inside Georgia between 1979 and 2012 in the GDELT reduced dataset. 1. if you just use the python script to subset the full (reduced) dataset, you end up with only events …
# Example for subsetting domestic events in Georgia from the GDELT reduced dataset.
# Read in the python output file.
GEO.ALL <- read.table("./R/GDELT/GEO.ALL.select.outfile.txt",sep="\t", header=TRUE)
# The header=T command didn't work, so fix that:
names(GEO.ALL) <- c("Day","Actor1Code","Actor2Code","EventCode","QuadCategory","GoldsteinScale",
"Actor1Geo_Lat","Actor1Geo_Long","Actor2Geo_Lat","Actor2Geo_Long","ActionGeo_Lat","ActionGeo_Long")
# To keep our subsetting function manageable, prep the GEO.ALL dataframe by substringing the first
### Keybase proof
I hereby claim:
* I am ahalterman on github.
* I am ahalt (https://keybase.io/ahalt) on keybase.
* I have a public key whose fingerprint is 5CEE CCB1 B548 682B B988 B999 952E E2B4 950D A417
To claim this, I am signing this object: