Excel has 2 options for outputting csv
files:
- 'CSV UTF-8' yields
UTF-8 Unicode English text, with very long lines, with CR line terminators
- 'Comma Separated Values' yields
Non-ISO extended-ASCII English text, with very long lines, with CRLF line terminators
The first option gives you UTF8 encoding but ends lines with this weird ^M
character, which is a DOS thing. The second gives you nice unix line breaks but has ASCII encoding which is no good.
So what you need to do is export as 'CSV UTF-8' and then do the following (to make that ^M
you need to hit ctrl-V ctrl-M):
tr ^M '\n' < original_output.csv > clean_output.csv
edit - apparently exporting as 'CSV UTF-8' does not always export in UTF-8 (facepalm).
Instead, select 'UTF-16 Unicode Text' and then do the following:
sed 's/\t/,/g' original_output.csv > clean_output.csv
I've also seen it happen where I originally saved a sheet as a UTF-16 text file and then later attempted to save it as a UTF-8 csv and had it fuck up. Only after using xxd
to look at the hex representation of the file did it dawn on me that this looks wonky.
I debugged the following bullshit error for like 45 minutes today:
1 with open('slo_caps.csv', 'rb') as f:
2 reader = csv.reader(f)
----> 3 for row in reader:
4 print row
5
Error: line contains NULL byte