Skip to content

Instantly share code, notes, and snippets.

View scott2b's full-sized avatar

Scott Bradley scott2b

  • Northwestern University
  • Evanston
View GitHub Profile
"""
Class for managing downloaded web pages
"""
import requests
import requests_cache
import uuid
requests_cache.install_cache()
class WebPage(object):
@scott2b
scott2b / themes.py
Last active April 19, 2018 04:16
extract themes from the Gdelt SET_EVENTPATTERNS.xml file
"""
The patterns file is here: https://github.com/ahalterman/GKG-Themes/blob/master/SET_EVENTPATTERNS.xml
It is not valid XML so using regex
There are non-theme entries in this file not considered here. The globals section at the top of the
file should be taken into account when processing documents with pattern matches.
"""
import re
@scott2b
scott2b / readfile.py
Last active April 19, 2018 02:42
Download and process a zipped csv file without saving to a tmp file
import csv
from io import BytesIO, TextIOWrapper
from urllib import request
from zipfile import ZipFile
url = 'http://data.gdeltproject.org/gdeltv2/20180419011500.gkg.csv.zip'
with ZipFile(BytesIO(request.urlopen(url).read())) as zf:
f = zf.namelist()[0]
with zf.open(f, 'r') as csvfile:
# for each home destination, find all paths that lead back home and print only
# the ones that are a full traversal (i.e. 5 hops)
for home in graph.keys():
print("Home: %s" % home)
for start in graph.keys():
if (start != home):
print([path for path in find_path(graph, start, home) if len(path) == 5])
print("---")
print("=====")
@scott2b
scott2b / protocol.c
Last active February 12, 2018 01:23
/**
* Proof-of-concept for establishing design bassis for an extremely compact
* data transfer protocol for wireless data transmission
*
* The idea is to have a protocol that is flexible in that arbitrary data types
* can be passed without the wasted space that would be caused by a struct-based
* approach that would need to reserve more space than required for a given message
*
* This example uses a byte for each data point to determine its type. This seems
* wasteful to use up a full extra byte for each data type, however, as seen with
@scott2b
scott2b / main.c
Last active February 12, 2018 01:02
Proof of concept for byte-space data type overloading
/**
* An updated version of this gist which gets closer to the project requirements
* is available here: https://gist.github.com/scott2b/67bdb6b0e7da8f154c979520adb98169
*
* Proof-of-concept for establishing a design bassis for an extremely compact
* data transfer protocol for wireless data transmission
*
* The idea is to have a protocol that is flexible, such that arbitrary data types
* can be passed without the wasted space that would be caused by a struct-based
* approach that would need to reserve more space than required for a given message
@scott2b
scott2b / coordinates.py
Last active October 10, 2017 15:44
get geo coordinates for US cities with population 100,000+
#!/usr/bin/env python
import json
import requests
from bs4 import BeautifulSoup
import re
LATLNG = re.compile(r'^.*?(-?\d+\.\d+); (-?\d+\.\d+).*$', re.S)
CITY = re.compile(r'^(.*?)(?:\[\d+\])?$', re.S)
@scott2b
scott2b / Elasticsearch client wrapper
Last active August 29, 2015 14:04
Working toward a more usable Elasticsearch client
"""
I find the Elasticsearch Python client a bit quirky to work with. A lot of this
has to do with the odd way that Elasticsearch documents and results are
organized (e.g. ['hits']['hits'] WTF?). The weird organization is exemplified
well in the need for the client to expose both `get` and `get_source` methods.
Also, there is a pretty bad lack of consistency: ['hits']['hits'] vs. ['docs'],
_version (with _) vs. found (without _), etc.
This is my crude attempt to make the client a bit more usable. Currently
implements `search`, `get`, and `mget`. `get_source` maps to `get` because there
@scott2b
scott2b / base.html
Created April 30, 2014 16:07
base.html for Bookshelf project using Twitter Bootstrap
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="description" content="">
<meta name="author" content="">
<link rel="shortcut icon" href="../../assets/ico/favicon.ico">
<title>Starter Template for Bootstrap</title>
@scott2b
scott2b / twitter_searcher.py
Last active January 26, 2019 18:26
TwitterSearcher. Class to manage aggressive Twitter API searching with the birdy AppClient.
import logging
import time
import urlparse
from birdy.twitter import AppClient
from birdy.twitter import TwitterRateLimitError, TwitterClientError
from delorean import parse, epoch
"""
Utilization:
searcher = TwitterSearcher(
TWITTER_CONSUMER_KEY,