Andy Halterman ahalterman

## spacy_events.py
import spacy
nlp = spacy.load("en_core_web_lg")

with open("scraped.json", "r") as f:
    news = json.load(f)
news = [i['body'] for i in news]
processed_docs = list(nlp.pipe(news))

verb_list = ["launch", "begin", "initiate", "start"]
dobj_list = ["attack", "offensive", "operation", "assault"]

## event_model_snippet.py
# many lines omitted above

def make_log(experiment_dir, X_train, X_test, Y_test, model, hist, custom_model):
    now = datetime.datetime.now()
    now = now.strftime("%Y-%m-%d %H:%M:%S")
    # get last commit hash
    commit = subprocess.check_output(['git', 'rev-parse', 'HEAD']).strip()
    # get precision and recall at a range of cutpoints
    cutoffs = [0.01, 0.05, 0.10, 0.20, 0.30, 0.40, 0.50, 0.60]
    precrecs = [precision_recall(X_test, Y_test, model, i) for i in cutoffs]

## dw_scraper.py
from __future__ import unicode_literals
from bs4 import BeautifulSoup
import requests
import json
import re
import datetime

from pymongo import MongoClient

connection = MongoClient()

## gist:7717afde4f391fc2e99c
### Keybase proof

I hereby claim:

  * I am ahalterman on github.
  * I am ahalt (https://keybase.io/ahalt) on keybase.
  * I have a public key whose fingerprint is 5CEE CCB1 B548 682B B988  B999 952E E2B4 950D A417

To claim this, I am signing this object:

## Brazil-GKG.Rmd
Brazilian Protest Themes in the Global Knowledge Graph
========================================================
Andrew Halterman
Caerus Analytics
January 10, 2014

The Global Knowledge Graph, in the [words of Kalev Leetaru](http://gdeltblog.wordpress.com/2013/10/27/announcing-the-debut-of-the-gdelt-global-knowledge-graph/), aims to "connect every person, organization, location, count, theme, news source, and event across the planet into a single massive network that captures what’s happening around the world, what its context is and who’s involved, and how the world is feeling about it, every single day." Because GKG takes the form of a network with entities and themes as nodes and co-mentions as edges, the obvious way to work with it is as a network graph using tools from social network analysis. Kalev's [work on Iran](http://www.foreignpolicy.com/articles/2013/11/26/the_tehran_connection_big_data_iran) shows the remarkable ability of automated community detection algorithms to cluster people according to th

## gdelt.ortho.r
# Author: Andrew Halterman.  30 August 2013
# R script for reproducing map of August 29, 2013 GDELT coverage
# Orthogonal map projection, centered on Cairo
# This assumes that you have your GDELT data stored in a SQLite database.
# For instructions on setting up SQLite and dplyr, see http://gdeltblog.wordpress.com/2013/08/29/subsetting-and-aggregating-gdelt-using-dplyr-and-sqlite/

library(dplyr)
library(RSQLite)
library(RSQLite.extfuns)

## subset.domestic.r
# Example for subsetting domestic events in Georgia from the GDELT reduced dataset.

# Read in the python output file.
GEO.ALL <- read.table("./R/GDELT/GEO.ALL.select.outfile.txt",sep="\t", header=TRUE)

# The header=T command didn't work, so fix that:
names(GEO.ALL) <- c("Day","Actor1Code","Actor2Code","EventCode","QuadCategory","GoldsteinScale",
	"Actor1Geo_Lat","Actor1Geo_Long","Actor2Geo_Lat","Actor2Geo_Long","ActionGeo_Lat","ActionGeo_Long")

# To keep our subsetting function manageable, prep the GEO.ALL dataframe by substringing the first
	import spacy
	nlp = spacy.load("en_core_web_lg")

	with open("scraped.json", "r") as f:
	news = json.load(f)
	news = [i['body'] for i in news]
	processed_docs = list(nlp.pipe(news))

	verb_list = ["launch", "begin", "initiate", "start"]
	dobj_list = ["attack", "offensive", "operation", "assault"]
	# many lines omitted above

	def make_log(experiment_dir, X_train, X_test, Y_test, model, hist, custom_model):
	now = datetime.datetime.now()
	now = now.strftime("%Y-%m-%d %H:%M:%S")
	# get last commit hash
	commit = subprocess.check_output(['git', 'rev-parse', 'HEAD']).strip()
	# get precision and recall at a range of cutpoints
	cutoffs = [0.01, 0.05, 0.10, 0.20, 0.30, 0.40, 0.50, 0.60]
	precrecs = [precision_recall(X_test, Y_test, model, i) for i in cutoffs]
	from __future__ import unicode_literals
	from bs4 import BeautifulSoup
	import requests
	import json
	import re
	import datetime

	from pymongo import MongoClient

	connection = MongoClient()
	### Keybase proof

	I hereby claim:

	* I am ahalterman on github.
	* I am ahalt (https://keybase.io/ahalt) on keybase.
	* I have a public key whose fingerprint is 5CEE CCB1 B548 682B B988 B999 952E E2B4 950D A417

	To claim this, I am signing this object:
	Brazilian Protest Themes in the Global Knowledge Graph
	========================================================
	Andrew Halterman
	Caerus Analytics
	January 10, 2014

	The Global Knowledge Graph, in the [words of Kalev Leetaru](http://gdeltblog.wordpress.com/2013/10/27/announcing-the-debut-of-the-gdelt-global-knowledge-graph/), aims to "connect every person, organization, location, count, theme, news source, and event across the planet into a single massive network that captures what’s happening around the world, what its context is and who’s involved, and how the world is feeling about it, every single day." Because GKG takes the form of a network with entities and themes as nodes and co-mentions as edges, the obvious way to work with it is as a network graph using tools from social network analysis. Kalev's [work on Iran](http://www.foreignpolicy.com/articles/2013/11/26/the_tehran_connection_big_data_iran) shows the remarkable ability of automated community detection algorithms to cluster people according to th
	# Author: Andrew Halterman. 30 August 2013
	# R script for reproducing map of August 29, 2013 GDELT coverage
	# Orthogonal map projection, centered on Cairo
	# This assumes that you have your GDELT data stored in a SQLite database.
	# For instructions on setting up SQLite and dplyr, see http://gdeltblog.wordpress.com/2013/08/29/subsetting-and-aggregating-gdelt-using-dplyr-and-sqlite/

	library(dplyr)
	library(RSQLite)
	library(RSQLite.extfuns)
	# Example for subsetting domestic events in Georgia from the GDELT reduced dataset.

	# Read in the python output file.
	GEO.ALL <- read.table("./R/GDELT/GEO.ALL.select.outfile.txt",sep="\t", header=TRUE)

	# The header=T command didn't work, so fix that:
	names(GEO.ALL) <- c("Day","Actor1Code","Actor2Code","EventCode","QuadCategory","GoldsteinScale",
	"Actor1Geo_Lat","Actor1Geo_Long","Actor2Geo_Lat","Actor2Geo_Long","ActionGeo_Lat","ActionGeo_Long")

	# To keep our subsetting function manageable, prep the GEO.ALL dataframe by substringing the first