Skip to content

Instantly share code, notes, and snippets.

View lecy's full-sized avatar

Jesse Lecy lecy

View GitHub Profile

Instructions

These instructions will help you better analyze the IRS 990 public dataset. The first thing you'll want to do is to read through the documentation over at Amazon. There's a ~108MB index file called index.json.gz that contains metadata describing the entire corpus.

To download the index.json.gz metadata file, you'll want to issue the following command: curl https://s3.amazonaws.com/irs-form-990/index.json.gz. Once you've downloaded the index.json.gz file, you can extract its contents with the following command: gunzip index.json.gz. To take a peek at the extracted contents, use the following command: head index.json.

Looking at the index.json file, you'll notice that it contains a json structure represented as a string. It contains an array of json objects that look like the following:

{"EIN": "721221647", "SubmittedOn": "2016-02-05", "TaxPeriod": "201412", "DLN": "93493309001115", "LastUpdated": "2016-03-21T17:2
@technickle
technickle / ValidateOpen311GeoReportBulk.r
Last active December 15, 2022 19:42
R validator script for Open311 GeoReport Bulk specification compatibility
# this R script evaluates a data file for compatibility with the Open311 GeoReport Bulk specification.
# see here for the most recent version of the specification:
# http://wiki.open311.org/GeoReport/bulk
#
# it implements nearly all of the checks identified in this document
# https://docs.google.com/document/d/1GLRniiT3xvmG-i6PPeZPZDK_FhBDGCpuVh5fCexEiys/preview
# however, it is very bare bones and the results need to be interpreted.
#
# written by Andrew Nicklin (@technickle) with contributions from the Open311 community.
#