Skip to content

Instantly share code, notes, and snippets.

@lukerosiak
lukerosiak / CatoApprops.py
Last active January 2, 2016 21:19
Parse appropriations from Cato XML
import os
import json
import re
import csv
from bs4 import BeautifulSoup
#format numbers
def commafy(x):
@lukerosiak
lukerosiak / embedsenate2014
Last active December 19, 2015 06:19
Embed Washington Examiner 2014 Senate battleground graphic
PREFERRED EMBED CODE (widget):
<div id="2014senatetossupsmap_hype_container" style="position:relative;overflow:hidden;width:600px;height:500px;">
<script type="text/javascript" charset="utf-8" src="http://s3.amazonaws.com/examiner/2014battleground/2014+SENATE+tossups+map.hyperesources/2014senatetossupsmap_hype_generated_script.js?70501"></script>
</div>
NON-PREFERRED EMBED CODE IF THAT DOESN'T WORK (iframe)
<iframe src="http://s3.amazonaws.com/examiner/2014battleground/map.html" width="620" height="500" scrolling="no" frameborder="no"/>
@lukerosiak
lukerosiak / mirror990s.py
Created June 6, 2013 19:52
Download text files representing OCR'd images of IRS Form 990s, corresponding to the URL scheme at bulk.resource.org/irs.gov/eo. Metadata for each file, such as name of nonprofit, IRS EIN, and year, is available in the "manifest" files there. A parser for those is available at github.com/lukerosiak/irs/.
import os
import boto
"""
Mirror the entire nonprofittext S3 bucket, downloading only files that aren't already present or if the S3 version is larger than the one we have.
The only dependency is boto. To install: pip install boto
To run: python download.py
@lukerosiak
lukerosiak / mlb.py
Last active April 2, 2018 00:39
Parse MLB transactions and injuries into a spreadsheet.
#This web site lists recent injuries for MLB players in HTML format, but requires you to click each team, etc.
#http://mlb.mlb.com/mlb/fantasy/injuries/
#To do real data analysis, we want a shitton and don't have time to click everywhere.
#So our exercise is to get all available injuries into one easy to use spreadsheet.
#By looking at "view source" on the web site, I found that the web site actually hits another web site, which provides the injuries, trades and other info in a computer-readable format called JSON, which is basically
#the same as python's dictionary type. You can only get one month at a time bc there are so many. See it here:
#http://mlb.mlb.com/lookup/json/named.transaction_all.bam?start_date=20120301&end_date=20120401&sport_code='mlb'
#Our code will hit this web site repeatedly for different dates, convert the web site's content into a python object, and then write certain fields from that object to a file of the CSV format
@lukerosiak
lukerosiak / gist:5328017
Created April 6, 2013 22:59
census.ire.org to PostgreSQL
#import all Census 2010 tables into PostgreSQL. Then use BoundaryService to import TIGER shapefiles into PostGIS and join them.
import os
s = """ire_H1.sql a year ago create table DDL for bulkdata export format [JoeGermuska]
ire_H10.sql a year ago create table DDL for bulkdata export format [JoeGermuska]
ire_H11.sql a year ago create table DDL for bulkdata export format [JoeGermuska]
ire_H11A.sql a year ago create table DDL for bulkdata export format [JoeGermuska]
ire_H11B.sql a year ago create table DDL for bulkdata export format [JoeGermuska]
ire_H11C.sql a year ago create table DDL for bulkdata export format [JoeGermuska]
@lukerosiak
lukerosiak / OLMS
Created February 5, 2013 20:00
Import OLMS into Postgres
import os
import psycopg2
#IMPORT AND UNZIL ALL YEARS OF FILES INTO THIS DIRECTORY
#SET VARIABLES IN THE NEXT 3 LINES
path = '/media/sf_bulk/labor/data/'
years = [str(x) for x in range(2000,2013)]
conn = psycopg2.connect(database="labor", user="", password="")
@lukerosiak
lukerosiak / diff-different-lines-only.txt
Created October 7, 2011 14:38
Diff between old disbursement detail output and new
OFFICE OF THE MINORITY WHIP,2009Q4,PERSONNEL COMPENSATION,,"D
| OFFICE OF THE MINORITY WHIP,2009Q4,PERSONNEL COMPENSATION,,"D
CAO OPERATIONS MANAGEMENT,2009Q4,TRAVEL,12-03,DARRYL A ATCHIS
| CAO OPERATIONS MANAGEMENT,2009Q4,TRAVEL,12-03,DOUGLAS MASSENG
COMMUNICATIONS,2009Q4,SUPPLIES AND MATERIALS,12-17,,,,FRAMING
| COMMUNICATIONS,2009Q4,SUPPLIES AND MATERIALS,12-17,FRAMING,,,
COMMUNICATIONS,2009Q4,SUPPLIES AND MATERIALS,12-17,,,,FRAMING
@lukerosiak
lukerosiak / strip.py
Created October 7, 2011 05:48
Get rid of fluff on fields in a CSV
"""
Ensure the new and old fields uses the same CSV quoting conventions and format decimals the same way (15.00 vs 15 and 16.10 vs 16.1), so we can run a diff without being distracted those differences.
"""
import csv
fin = csv.reader(open('../../archives/3_csv_original/2011Q3-summary-sunlight.csv','r'))
fout = csv.writer(open('../../archives/3_csv_original/2011Q3-summary-sunlight-stripped.csv','w'))
@lukerosiak
lukerosiak / flattenfactfinder.py
Created September 20, 2011 23:10
Put columns for multiple geographies of ACS 2010 comparison profiles into more usable file, including combining with annotated file.
import csv
fout = csv.writer( open('cpflat.csv','wU') )
def process(i):
fin = csv.reader( open('ACS_10_1YR_CP0%s.csv' % i,'r') )
fin_ann = csv.reader( open('ACS_10_1YR_CP0%s_ann.csv' % i,'r') )
fin.next()
headers = fin.next()[3:]