ghing/loader_process_notes.md

## loader_process_notes.md

      
    Raw
  

              loader_process_notes.md
            
          
    Loader process

Starting with the most recent election look at the data files.
Document fields

names
types
formatting conventions.

Document any data quirks. I put this in the docstring of the loader class. I've found I need to refer to these notes when implementing the translations.
Quirks might be things like:

Duplicate entries
Complicated splits (e.g. congressional district by county)
Where are name suffixes stored?
How are write-in candidates identified?
How are non-party candidates identified?

Recipes for inspecting data

Get unique offices in a data file

csvcut  -t -c 5 us/fl/cache/20120814__fl__primary.tsv  | tail -n +2 | sort | uniq

The -t option to csvcut indicates that the file is tab-delimited.
csvcut always outputs the column name first, so we use the tail command to remove the column name from the output.
Get unique parties and offices in all loaded RawResults for a state

>>> import pprint
>>> from openelex.models import RawResult
>>> pprint.pprint(RawResult.objects.filter(state='FL').distinct('party'))
...
>>> pprint.pprint(RawResult.objects.filter(state='FL').distinct('office'))