Skip to content

Instantly share code, notes, and snippets.

@ryantuck
Last active March 17, 2017 19:10
Show Gist options
  • Save ryantuck/6d5b4c99c4e346dee75c39d1f391b51e to your computer and use it in GitHub Desktop.
Save ryantuck/6d5b4c99c4e346dee75c39d1f391b51e to your computer and use it in GitHub Desktop.
How to overcome Excel being shitty with csv outputs

excel / csv hell

Excel has 2 options for outputting csv files:

  • 'CSV UTF-8' yields UTF-8 Unicode English text, with very long lines, with CR line terminators
  • 'Comma Separated Values' yields Non-ISO extended-ASCII English text, with very long lines, with CRLF line terminators

The first option gives you UTF8 encoding but ends lines with this weird ^M character, which is a DOS thing. The second gives you nice unix line breaks but has ASCII encoding which is no good.

So what you need to do is export as 'CSV UTF-8' and then do the following (to make that ^M you need to hit ctrl-V ctrl-M):

tr ^M '\n' < original_output.csv > clean_output.csv

edit - apparently exporting as 'CSV UTF-8' does not always export in UTF-8 (facepalm).

Instead, select 'UTF-16 Unicode Text' and then do the following:

sed 's/\t/,/g' original_output.csv > clean_output.csv

UTF-16 hell

I've also seen it happen where I originally saved a sheet as a UTF-16 text file and then later attempted to save it as a UTF-8 csv and had it fuck up. Only after using xxd to look at the hex representation of the file did it dawn on me that this looks wonky.

I debugged the following bullshit error for like 45 minutes today:

      1 with open('slo_caps.csv', 'rb') as f:
      2     reader = csv.reader(f)
----> 3     for row in reader:
      4         print row
      5

Error: line contains NULL byte
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment