Skip to content

Instantly share code, notes, and snippets.

@dexterous
Last active July 19, 2019 19:36
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dexterous/9d99eff951fef57f07dbdbf520f4d7b4 to your computer and use it in GitHub Desktop.
Save dexterous/9d99eff951fef57f07dbdbf520f4d7b4 to your computer and use it in GitHub Desktop.
Sed script to fix CSV file with unescaped new lines.
#!/bin/sed -nrf
s_,,,,_,"","","",_g # first we substitute blank fields with quoted blanks for consistency
s_,,,_,"","",_g # first we substitute blank fields with quoted blanks for consistency
s_,,_,"",_g # first we substitute blank fields with quoted blanks for consistency
s_,$_,""_ # then we handle similar blank trailing fields
/^([^"]|",|"")/ { # if the line does not start with " (incomplete line)
x # first swap the previous line [see (*) below] into pattern space and this incomplete line into hold space
G # add the above held incomplete like to the pattern separated by \n
s,\n,\\n,m # escape \n
/[^"]"$/ p # print line if it ends with "
h # hold the whole corrected line (incase next line is also an incomplete line, i.e. record broken over more than 2 lines) (*)
d # start processing next line
} # end multi-line processing loop
/[^"]"$/ p # print line if it ends with "
h # put pattern into hold space (*)
1 asdf jkl;
2 hello world there
3 foo bar boo baz
4 whatever wherever
5 text with "quoted quotes" in it too
6 some more data here
$ ./fix-multi foo.csv
"1","asdf","jkl;"
"2","hello\nworld","there"
"3","foo\nbar\nboo","baz"
"4","whatever\n","wherever"
"5","text with\n""quoted quotes"" in it","too"
"6","some more","data here"
@dexterous
Copy link
Author

dexterous commented Jan 4, 2017

View foo.csv in raw mode to see line breaks, apparently GitHub's CSV renderer does a pretty damn good job of handling multiline records! 😛

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment