ryantuck/excel_csv_export.md

## excel_csv_export.md

      
    Raw
  

              excel_csv_export.md
            
          
    excel / csv hell

Excel has 2 options for outputting csv files:

'CSV UTF-8' yields UTF-8 Unicode English text, with very long lines, with CR line terminators
'Comma Separated Values' yields Non-ISO extended-ASCII English text, with very long lines, with CRLF line terminators

The first option gives you UTF8 encoding but ends lines with this weird ^M character, which is a DOS thing. The second gives you nice unix line breaks but has ASCII encoding which is no good.
So what you need to do is export as 'CSV UTF-8' and then do the following (to make that ^M you need to hit ctrl-V ctrl-M):
tr ^M '\n' < original_output.csv > clean_output.csv

edit - apparently exporting as 'CSV UTF-8' does not always export in UTF-8 (facepalm).
Instead, select 'UTF-16 Unicode Text' and then do the following:
sed 's/\t/,/g' original_output.csv > clean_output.csv

UTF-16 hell

I've also seen it happen where I originally saved a sheet as a UTF-16 text file and then later attempted to save it as a UTF-8 csv and had it fuck up. Only after using xxd to look at the hex representation of the file did it dawn on me that this looks wonky.
I debugged the following bullshit error for like 45 minutes today:
      1 with open('slo_caps.csv', 'rb') as f:
      2     reader = csv.reader(f)
----> 3     for row in reader:
      4         print row
      5

Error: line contains NULL byte