Skip to content

Instantly share code, notes, and snippets.

View onyxfish's full-sized avatar

Christopher Groskopf onyxfish

View GitHub Profile
@onyxfish
onyxfish / README.md
Last active January 2, 2023 14:37
Google Spreadsheets script to generate slugs from a range of cells

This script for Google Spreadsheets allows you to generate slugs for your data such as might be used for creating unique urls.

Use it like this!

# A B C
1 a b slug
2 foo baz bing =slugify(A2:B4)
3 bar BAZ
4 FOO baz-bing
@onyxfish
onyxfish / example1.py
Created March 5, 2010 16:51
Basic example of using NLTK for name entity extraction.
import nltk
with open('sample.txt', 'r') as f:
sample = f.read()
sentences = nltk.sent_tokenize(sample)
tokenized_sentences = [nltk.word_tokenize(sentence) for sentence in sentences]
tagged_sentences = [nltk.pos_tag(sentence) for sentence in tokenized_sentences]
chunked_sentences = nltk.batch_ne_chunk(tagged_sentences, binary=True)
@onyxfish
onyxfish / csvkit_tutorial.sh
Created April 17, 2011 23:23
Full script for the csvkit tutorial (proof of process repeatability)
#!/bin/bash
###################
# Getting started #
###################
# Setup a workspace
mkdir va_benefits
cd va_benefits
@onyxfish
onyxfish / fabfile.py
Created February 9, 2010 23:05
Chicago Tribune News Applications fabric deployment script
from fabric.api import *
"""
Base configuration
"""
env.project_name = '$(project)'
env.database_password = '$(db_password)'
env.site_media_prefix = "site_media"
env.admin_media_prefix = "admin_media"
env.newsapps_media_prefix = "na_media"
@onyxfish
onyxfish / wordpress.vcl
Created June 28, 2011 21:34
ChicagoNow Varnish configuration (development version)
backend app1 {
.host = "127.0.0.1";
.port = "8000";
}
acl purge {
"127.0.0.1";
"::1";
}
@onyxfish
onyxfish / README.md
Created March 30, 2017 14:06
Import the entire Bureau of Labor Statistics (BLS) Quarterly Census of Wages (QCEW) dataset into a PostgreSQL database

QCEW Data Loader

These scripts import the entire Bureau of Labor Statistics Quarterly Census of Employement and Wages (from 1990 to latest) into one giant PostgreSQL database.

The database created by this process will use about 100GB of disk space. Make sure you have enough space available before you start!

Configuration

Database name, table name, and more can be configured via config.sh.

@onyxfish
onyxfish / newsapps-varnish-admin.php
Created June 28, 2011 21:32
Newsapps Varnish plugin for Wordpress (handles targeted invalidation)
<?php
if($_POST['varnish_hidden'] == 'Y') {
//Process form data
$varnish_server1 = (!isset($_POST['varnish_server1'])? '': $_POST['varnish_server1']);
update_site_option('varnish_server1', $varnish_server1);
$varnish_server2 = (!isset($_POST['varnish_server2'])? '': $_POST['varnish_server2']);
update_site_option('varnish_server2', $varnish_server2);
$varnish_server3 = (!isset($_POST['varnish_server3'])? '': $_POST['varnish_server3']);
@onyxfish
onyxfish / map.py
Created March 25, 2011 19:33
Django management command to construct a KML map from the Boundary Service
from colorsys import hsv_to_rgb
import json
import logging
log = logging.getLogger('electioncenter.lib.kml')
import struct
from django.contrib.gis.geos import GEOSGeometry
def hsv_to_kml_hex(h, s, v):
"""
@onyxfish
onyxfish / ordered_json.py
Created May 29, 2014 17:25
Usage of object_pairs_hook to load JSON with guaranteed key order
#!/usr/bin/env python
from collections import OrderedDict
import json
write_data = OrderedDict([
('a', '1'),
('b', '2'),
('c', '3')
])
@onyxfish
onyxfish / linestring.py
Last active October 7, 2016 14:51
An agate aggregation to generate geojson linestrings from sequential row data
import agate
import geojson
class LineString(agate.Aggregation):
def __init__(self, lat_column, lng_column):
self._lat_column_name = lat_column
self._lng_column_name = lng_column
def get_aggregate_data_type(self, table):
return agate.Text()