Skip to content

Instantly share code, notes, and snippets.

@apendleton
apendleton / letter_breakdown.py
Created May 8, 2012 19:57
Optimal alphabetical breakdown of registration lines
"""
This script ingests a CSV of last names and builds JSON output on the
percentage of last names that begin with each letter of the alphabet.
I initially ran it on a CSV-ified version of the US Census's list of all last
names with more than 100 occurrences, and used the frequency field within that
file to weigh the output, but absent that field, it assigns equal weight to
each name, allowing us to also process our TCamp 2012 attendence list.
Using the script on the Census data available in: