Skip to content

Instantly share code, notes, and snippets.

@johnb30
johnb30 / download_phox_s3.py
Created June 12, 2017 15:36
Simple script to download Phoenix data from the Amazon S3 bucket to a local directory.
import os
from boto.s3.connection import S3Connection
conn = S3Connection('<aws access key>', '<aws secret key>')
mybucket = conn.get_bucket('oeda')
# Need to modify this line to make sure the directory is correct
directory = 'phox/'
@johnb30
johnb30 / data.csv
Created December 8, 2013 23:25
Attempting to fit a negative binomial model with clustered SEs
cowcode1 cowcode2 year pol_rel terrorCounts terrorCounts2 rivalry jointDem1 logcapratio historyl1 historyl2 coldwar1 conflict1 conflict2 contiguity war1 war2 dyadid
2 20 1968 1 0 0 0 1 1.226615 3.583797 .6981347 0 1 0 1 0 0 2020
2 20 1969 1 0 0 0 1 1.238537 3.020913 .6981347 0 1 0 1 0 0 2020
2 20 1970 1 0 0 0 1 1.175389 3.044998 1.206968 0 1 0 1 0 0 2020
2 20 1971 1 0 0 0 1 1.142114 3.286908 1.015231 0 1 0 1 0 0 2020
2 20 1972 1 0 0 0 1 1.116908 3.17847 .9593502 0 1 0 1 0 0 2020
2 20 1973 1 0 0 0 1 1.101906 3.143146 .7777947 0 1 0 1 0 0 2020
2 20 1974 1 0 0 0 1 1.080219 3.207786 .7667958 0 0 0 1 0 0 2020
2 20 1975 1 0 0 0 1 1.058303 3.413949 .6981347 0 0 0 1 0 0 2020
2 20 1976 1 0 0 0 1 1.060871 3.493439 .6412689 0 0 0 1 0 0 2020
@johnb30
johnb30 / Root_events_analysis.ipynb
Created October 28, 2013 14:29
Analysis of root vs. non-root events from GDELT.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@johnb30
johnb30 / gdelt_md5check.sh
Created October 27, 2013 03:08
Bash script for OS X to check md5 values of the downloaded zip files.
#!/bin/bash
for file in $1*
do
genhash=`md5 "$file" | tail -c 33`
if grep -q "$genhash" "$2"; then
echo Found hash for "$file"
else
echo Did not find hash for "$file"
fi
done
@johnb30
johnb30 / gdelt_intro.ipynb
Created September 27, 2013 15:05
Introduction to GDELT for the Hacking GDELT event
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@johnb30
johnb30 / unique_actors.py
Created April 23, 2013 23:21
Code used to pull the unique actors from the GDELT dataset.
from path import path
import pandas as pd
unique_actors = dict()
lengths = list()
for in_file in path.getcwd().files('*.reduced.txt'):
data = open(in_file, 'r')
print "%s data read in...subsetting" % (in_file)
for line in data:
line = line.replace('\n', '')
@johnb30
johnb30 / gdelt_subset.py
Created April 5, 2013 02:44
Brief tutorial on subsetting the GDELT dataset.
from path import path
import pandas as pd
allActors = ['AFG', 'ALA', 'ALB', 'DZA', 'ASM', 'AND', 'AGO', 'AIA', 'ATG',
'ARG', 'ARM', 'ABW', 'AUS', 'AUT', 'AZE', 'BHS', 'BHR', 'BGD',
'BRB', 'BLR', 'BEL', 'BLZ', 'BEN', 'BMU', 'BTN', 'BOL', 'BIH',
'BWA', 'BRA', 'VGB', 'BRN', 'BGR', 'BFA', 'BDI', 'KHM', 'CMR',
'CAN', 'CPV', 'CYM', 'CAF', 'TCD', 'CHL', 'CHN', 'COL', 'COM',
'COD', 'COG', 'COK', 'CRI', 'CIV', 'HRV', 'CUB', 'CYP', 'CZE',
'DNK', 'DJI', 'DMA', 'DOM', 'TMP', 'ECU', 'EGY', 'SLV', 'GNQ',
@johnb30
johnb30 / scrapeTutorial
Created February 9, 2013 00:52
Tutorial on web scraping for PL SC 597I: Event Data
{
"metadata": {
"name": "Scraping Tutorial"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
@johnb30
johnb30 / parallel_subset.py
Created December 7, 2012 23:06
Parallel implementation of data subsetting
import numpy as np
from joblib import Parallel, delayed
def subset(file):
dataOut = []
data = open(file, 'r')
data.readline()
for line in data:
splitLine = line.split('\t')
if splitLine[3] == '57':
@johnb30
johnb30 / bootstrap.py
Created November 29, 2012 01:06
Bootstrapped two-sample t-test in Python
from __future__ import division
import numpy as np
import pandas as pd
import random
def sample(data):
sample = [random.choice(data) for _ in xrange(len(data))]
return sample
def bootstrap_t_test(treatment, control, nboot = 1000, direction = "less"):