Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@ghing
Last active August 29, 2015 13:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ghing/9556593 to your computer and use it in GitHub Desktop.
Save ghing/9556593 to your computer and use it in GitHub Desktop.
Openelections loader implementation notes

Loader process

Starting with the most recent election look at the data files.

Document fields

  • names
  • types
  • formatting conventions.

Document any data quirks. I put this in the docstring of the loader class. I've found I need to refer to these notes when implementing the translations.

Quirks might be things like:

  • Duplicate entries
  • Complicated splits (e.g. congressional district by county)
  • Where are name suffixes stored?
  • How are write-in candidates identified?
  • How are non-party candidates identified?

Recipes for inspecting data

Get unique offices in a data file

csvcut  -t -c 5 us/fl/cache/20120814__fl__primary.tsv  | tail -n +2 | sort | uniq

The -t option to csvcut indicates that the file is tab-delimited.

csvcut always outputs the column name first, so we use the tail command to remove the column name from the output.

Get unique parties and offices in all loaded RawResults for a state

>>> import pprint
>>> from openelex.models import RawResult
>>> pprint.pprint(RawResult.objects.filter(state='FL').distinct('party'))
...
>>> pprint.pprint(RawResult.objects.filter(state='FL').distinct('office'))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment