This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import spacy | |
nlp = spacy.load("en_core_web_lg") | |
with open("scraped.json", "r") as f: | |
news = json.load(f) | |
news = [i['body'] for i in news] | |
processed_docs = list(nlp.pipe(news)) | |
verb_list = ["launch", "begin", "initiate", "start"] | |
dobj_list = ["attack", "offensive", "operation", "assault"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# many lines omitted above | |
def make_log(experiment_dir, X_train, X_test, Y_test, model, hist, custom_model): | |
now = datetime.datetime.now() | |
now = now.strftime("%Y-%m-%d %H:%M:%S") | |
# get last commit hash | |
commit = subprocess.check_output(['git', 'rev-parse', 'HEAD']).strip() | |
# get precision and recall at a range of cutpoints | |
cutoffs = [0.01, 0.05, 0.10, 0.20, 0.30, 0.40, 0.50, 0.60] | |
precrecs = [precision_recall(X_test, Y_test, model, i) for i in cutoffs] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from __future__ import unicode_literals | |
from bs4 import BeautifulSoup | |
import requests | |
import json | |
import re | |
import datetime | |
from pymongo import MongoClient | |
connection = MongoClient() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
### Keybase proof | |
I hereby claim: | |
* I am ahalterman on github. | |
* I am ahalt (https://keybase.io/ahalt) on keybase. | |
* I have a public key whose fingerprint is 5CEE CCB1 B548 682B B988 B999 952E E2B4 950D A417 | |
To claim this, I am signing this object: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Brazilian Protest Themes in the Global Knowledge Graph | |
======================================================== | |
Andrew Halterman | |
Caerus Analytics | |
January 10, 2014 | |
The Global Knowledge Graph, in the [words of Kalev Leetaru](http://gdeltblog.wordpress.com/2013/10/27/announcing-the-debut-of-the-gdelt-global-knowledge-graph/), aims to "connect every person, organization, location, count, theme, news source, and event across the planet into a single massive network that captures what’s happening around the world, what its context is and who’s involved, and how the world is feeling about it, every single day." Because GKG takes the form of a network with entities and themes as nodes and co-mentions as edges, the obvious way to work with it is as a network graph using tools from social network analysis. Kalev's [work on Iran](http://www.foreignpolicy.com/articles/2013/11/26/the_tehran_connection_big_data_iran) shows the remarkable ability of automated community detection algorithms to cluster people according to th |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Author: Andrew Halterman. 30 August 2013 | |
# R script for reproducing map of August 29, 2013 GDELT coverage | |
# Orthogonal map projection, centered on Cairo | |
# This assumes that you have your GDELT data stored in a SQLite database. | |
# For instructions on setting up SQLite and dplyr, see http://gdeltblog.wordpress.com/2013/08/29/subsetting-and-aggregating-gdelt-using-dplyr-and-sqlite/ | |
library(dplyr) | |
library(RSQLite) | |
library(RSQLite.extfuns) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Example for subsetting domestic events in Georgia from the GDELT reduced dataset. | |
# Read in the python output file. | |
GEO.ALL <- read.table("./R/GDELT/GEO.ALL.select.outfile.txt",sep="\t", header=TRUE) | |
# The header=T command didn't work, so fix that: | |
names(GEO.ALL) <- c("Day","Actor1Code","Actor2Code","EventCode","QuadCategory","GoldsteinScale", | |
"Actor1Geo_Lat","Actor1Geo_Long","Actor2Geo_Lat","Actor2Geo_Long","ActionGeo_Lat","ActionGeo_Long") | |
# To keep our subsetting function manageable, prep the GEO.ALL dataframe by substringing the first |