Skip to content

Instantly share code, notes, and snippets.

@konklone
Forked from lukerosiak/strip.py
Created January 5, 2012 15:53
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save konklone/1565821 to your computer and use it in GitHub Desktop.
Save konklone/1565821 to your computer and use it in GitHub Desktop.
Get rid of fluff on fields in a CSV
#!/usr/bin/env python
"""
Get rid of white space and periods on the old file, and ensure the new one uses the same CSV quoting conventions, so we can run a diff without being distracted those differences.
"""
import csv
directories = ["luke", "sunlight"]
base = "2011Q3-summary"
for directory in directories:
fin = csv.reader(open('%s/%s.csv' % (directory, base),'r'))
fout = csv.writer(open('%s/%s-stripped.csv' % (directory, base),'w'))
for line in fin:
newline = []
for field in line:
newline.append( field.strip().strip('.').strip() )
fout.writerow(newline)
"""
run this:
diff --suppress-common-lines -y -W 1500 old-detail-stripped.csv new-detail-stripped.csv > diff.txt
and you should see differences only in lines where the RECIP (orig) started with DO, like DOUG--the old script erroneously replaced those with the name above it!
There are also some 'government contributions' lines that will show up in the diff. They were spaced wrong before, and now they are displaying correctly.
"""
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment