Starting with the most recent election look at the data files.
Document fields
- names
- types
- formatting conventions.
Document any data quirks. I put this in the docstring of the loader class. I've found I need to refer to these notes when implementing the translations.
Quirks might be things like:
- Duplicate entries
- Complicated splits (e.g. congressional district by county)
- Where are name suffixes stored?
- How are write-in candidates identified?
- How are non-party candidates identified?
csvcut -t -c 5 us/fl/cache/20120814__fl__primary.tsv | tail -n +2 | sort | uniq
The -t option to csvcut indicates that the file is tab-delimited.
csvcut always outputs the column name first, so we use the tail command to remove the column name from the output.
>>> import pprint
>>> from openelex.models import RawResult
>>> pprint.pprint(RawResult.objects.filter(state='FL').distinct('party'))
...
>>> pprint.pprint(RawResult.objects.filter(state='FL').distinct('office'))