Skip to content

Instantly share code, notes, and snippets.

@ryanpitts
Last active August 29, 2018 21:07
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ryanpitts/0dbd6bf254ce84e778b654c9174e845e to your computer and use it in GitHub Desktop.
Save ryanpitts/0dbd6bf254ce84e778b654c9174e845e to your computer and use it in GitHub Desktop.
quick csv grouping script in python re: https://twitter.com/rachelwalexande/status/1034236011830697986
import csv
from collections import OrderedDict
CLEANED_DICT = OrderedDict()
METADATA_FIELDS = ['districtID','district','schoolID','school']
VALUE_FIELDS = []
with open('schools.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
if (row['schoolID'] not in CLEANED_DICT.keys()):
# we don't have a row for this school yet,
# so create it with the basic metadata
CLEANED_DICT[row['schoolID']] = {
k:row[k] for k in METADATA_FIELDS
}
# create a column designator based on the student group in this csv row
# e.g. 'all_students' or 'economically_disadvantaged'
value = row['students'].lower().replace(' ','_').replace('/','_')
value_types = ['total','pct']
for value_type in value_types:
# make a csv-friendly header for this column, e.g. 'total_all_students'
column_header = value_type + '_' + value
# add it to our running list of fields for the output csv
if column_header not in VALUE_FIELDS: VALUE_FIELDS.append(column_header)
# add value for that group of students to this school's row
CLEANED_DICT[row['schoolID']].update({column_header: row[value_type]})
# create the output csv
with open('schools_cleaned.csv', 'w') as csvfile:
fieldnames = METADATA_FIELDS + VALUE_FIELDS
writer = csv.DictWriter(csvfile, fieldnames=fieldnames, lineterminator='\n')
writer.writeheader()
for k,v in CLEANED_DICT.items():
writer.writerow(v)
districtID district schoolID school students total pct
2142 Salem-Keiser 728 Auburn Elementary School All Students 648 29.2
2142 Salem-Keiser 728 Auburn Elementary School Economically Disadvantaged 648 29.2
2142 Salem-Keiser 728 Auburn Elementary School English Learners 409 25.9
2142 Salem-Keiser 728 Auburn Elementary School Hispanic/Latino 416 27.6
2142 Salem-Keiser 5066 Battle Creek Elementary School All Students 568 20.4
2142 Salem-Keiser 5066 Battle Creek Elementary School Economically Disadvantaged 267 27
2142 Salem-Keiser 5066 Battle Creek Elementary School English Learners 30 16.7
2142 Salem-Keiser 5066 Battle Creek Elementary School Hispanic/Latino 59 23.7
districtID district schoolID school total_all_students pct_all_students total_economically_disadvantaged pct_economically_disadvantaged total_english_learners pct_english_learners total_hispanic_latino pct_hispanic_latino
2142 Salem-Keiser 728 Auburn Elementary School 648 29.2 648 29.2 409 25.9 416 27.6
2142 Salem-Keiser 5066 Battle Creek Elementary School 568 20.4 267 27 30 16.7 59 23.7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment