Skip to content

Instantly share code, notes, and snippets.

View onyxfish's full-sized avatar

Christopher Groskopf onyxfish

View GitHub Profile
@onyxfish
onyxfish / fabfile.py
Created February 9, 2010 23:05
Chicago Tribune News Applications fabric deployment script
from fabric.api import *
"""
Base configuration
"""
env.project_name = '$(project)'
env.database_password = '$(db_password)'
env.site_media_prefix = "site_media"
env.admin_media_prefix = "admin_media"
env.newsapps_media_prefix = "na_media"
@onyxfish
onyxfish / example1.py
Created March 5, 2010 16:51
Basic example of using NLTK for name entity extraction.
import nltk
with open('sample.txt', 'r') as f:
sample = f.read()
sentences = nltk.sent_tokenize(sample)
tokenized_sentences = [nltk.word_tokenize(sentence) for sentence in sentences]
tagged_sentences = [nltk.pos_tag(sentence) for sentence in tokenized_sentences]
chunked_sentences = nltk.batch_ne_chunk(tagged_sentences, binary=True)
@onyxfish
onyxfish / fabfile.py
Created March 12, 2010 15:27
Fabric script to deploy staticly built tables from ProPublica's table-setter application to S3.
from fabric.api import *
"""
Base configuration
"""
env.project_name = 'tables'
"""
Environments
"""
import inspect
# dir()
# >>> ['RuntimeException', 'String', 'TropoApp', 'TropoCall', 'TropoChoice', 'TropoEvent', '__name__', '_handleCallBack', '_parseTime', 'a', 'action', 'answer', 'appInstance', 'ask', 'call', 'callFactory', 'conference', 'conferenceFactory', 'context', 'createConference', 'currentApp', 'currentCall', 'destroyConference', 'engine', 'hangup', 'incomingCall', 'log', 'prompt', 'record', 'redirect', 'reject', 'say', 'startCallRecording', 'stopCallRecording', 'token', 'transcribe', 'transcription', 'transfer', 'wait']
if (currentCall):
log("READ HERE: Incoming")
# log(currentCall) # object instance
# log(action) # undefined
@onyxfish
onyxfish / Safecity Setup
Created May 10, 2010 19:15
Setup instructions for safecity
# Assumes you have Postgres installed
mkvirtualenv --no-site-packages safecity
easy_install pip
pip install -r requirements.txt
# The default dataset is the South Austin neighborhood of Chicago
# For complete dataset edit bootstrap.sh to remove the flag to ./manage load_centerline
# For the downtown/loop dataset edit bootstrap.sh to change the flag to ./manage load_centerline -d
@onyxfish
onyxfish / get_noaa_stations.py
Created July 1, 2010 19:22
Fetch NOAA weather stations for IL from web.
#!/bin/env python
import csv
import re
from urllib2 import urlopen
from BeautifulSoup import BeautifulSoup, BeautifulStoneSoup
NOAA_ROOT_URL = 'http://www.weather.gov/'
@onyxfish
onyxfish / get_weatherbug_stations.py
Created July 1, 2010 19:23
Fetch WeatherBug weather stations for Chicago from web.
#!/bin/env python
import csv
import re
from urllib2 import urlopen
from BeautifulSoup import BeautifulStoneSoup
WEATHERBUG_API_KEY = 'A6464697672'
WEATHERBUG_SEARCH_URL = 'http://api.wxbug.net/getStationsXML.aspx?ACode=' + WEATHERBUG_API_KEY + '&zipCode=60601&unittype=0'
@onyxfish
onyxfish / Advice from #djangocon 2010 Django-in-Journalism open-session
Created September 9, 2010 17:31
Advice from #djangocon 2010 Django-in-Journalism open-session
Suggestions from the 11 hackers at the table:
* Use connection pooling (pgpool).
* Don't expect reporters to get excited until you can show them something. (Find a way to appeal to reporters interests.)
* Only update what's changed. (e.g. on election results: show changes, not raw numbers)
* Use the AP's "dbready" format for election results.
* Use CSV for everything.
* Use pdb with runserver for debugging.
* Beware circular imports when using Haystack.
* Make the case for building news apps with government data. (Niran will provide numbers showing that people look at it.)
@onyxfish
onyxfish / how_to_fetch_google_spreadsheets_from_appengine.py
Created October 10, 2010 15:22
How to fetch Google Spreadsheets from AppEngine
def fetch_csv(self, options):
"""
Retrieves a single Google spreadsheet as CSV using ClientLogin
authentication.
TODO: handle retries and timeouts on auth calls
TOOD: handle retries and timeouts on content fetching
"""
client = gdata.docs.client.DocsClient()
client.ClientLogin(config.USER_EMAIL, config.USER_PASSWORD, config.APP_DOMAIN)
@onyxfish
onyxfish / elections.chicagotribune.com.vcl
Created November 3, 2010 18:55
Elections Center Varnish Configuration File
backend app1 {
.host = "1.1.1.1";
.port = "80";
}
backend app2 {
.host = "2.2.2.2";
.port = "80";
}