Skip to content

Instantly share code, notes, and snippets.

@lukerosiak
Created October 7, 2011 05:48
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save lukerosiak/1269562 to your computer and use it in GitHub Desktop.
Save lukerosiak/1269562 to your computer and use it in GitHub Desktop.
Get rid of fluff on fields in a CSV
"""
Ensure the new and old fields uses the same CSV quoting conventions and format decimals the same way (15.00 vs 15 and 16.10 vs 16.1), so we can run a diff without being distracted those differences.
"""
import csv
fin = csv.reader(open('../../archives/3_csv_original/2011Q3-summary-sunlight.csv','r'))
fout = csv.writer(open('../../archives/3_csv_original/2011Q3-summary-sunlight-stripped.csv','w'))
for line in fin:
newline = []
i = 0
for field in line:
field = field.strip().strip('.').replace(',','').strip()
if i>3 and field not in ["YTD","AMOUNT"]: #number. resolve precision issue
field = float(field)
newline.append( field )
i = i+1
fout.writerow(newline)
fin = csv.reader(open('2011Q3-house-disburse-summary.csv','r'))
fout = csv.writer(open('2011Q3-house-disburse-summary-stripped.csv','w'))
for line in fin:
newline = []
i = 0
for field in line:
field = field.strip().strip('.').replace(',','').strip()
if i>3 and field not in ["YTD","AMOUNT"]: #number. resolve precision issue
field = float(field)
newline.append( field )
i = i+1
fout.writerow(newline)
"""
run this:
diff --suppress-common-lines -y -W 1500 old-detail-stripped.csv new-detail-stripped.csv > diff.txt
"""
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment