Skip to content

Instantly share code, notes, and snippets.

@kspurgin
Created September 30, 2020 21:14
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kspurgin/69a757fcad4cc92fe6aa520ce5a2f72e to your computer and use it in GitHub Desktop.
Save kspurgin/69a757fcad4cc92fe6aa520ce5a2f72e to your computer and use it in GitHub Desktop.
csv_column_splitting_headache
I use a little awk oneliner derived from https://www.datafix.com.au/cookbook/structure1.html
to verify the structure of client-supplied CSVs (that I convert to TSVs) or TSVs. One client's
table of object data provided as TSV used CRLF row endings, AND included TAB, CRLF, CR, and LF
characters inside individual fields to format multiline notes.
The result of my check on this ONE FILE was as follows:
292 rows are broken into 82 columns
606 rows are broken into 1 columns
486 rows are broken into 0 columns
152 rows are broken into 25 columns
130 rows are broken into 58 columns
123 rows are broken into 22 columns
108 rows are broken into 19 columns
96 rows are broken into 64 columns
79 rows are broken into 28 columns
76 rows are broken into 55 columns
62 rows are broken into 59 columns
57 rows are broken into 24 columns
40 rows are broken into 3 columns
39 rows are broken into 26 columns
34 rows are broken into 61 columns
32 rows are broken into 4 columns
32 rows are broken into 2 columns
21 rows are broken into 34 columns
19 rows are broken into 6 columns
19 rows are broken into 57 columns
17 rows are broken into 53 columns
17 rows are broken into 32 columns
17 rows are broken into 30 columns
17 rows are broken into 18 columns
15 rows are broken into 39 columns
15 rows are broken into 17 columns
11 rows are broken into 44 columns
10 rows are broken into 66 columns
8 rows are broken into 36 columns
8 rows are broken into 27 columns
7 rows are broken into 9 columns
7 rows are broken into 5 columns
6 rows are broken into 7 columns
6 rows are broken into 15 columns
5 rows are broken into 37 columns
5 rows are broken into 20 columns
4 rows are broken into 56 columns
3 rows are broken into 12 columns
2 rows are broken into 60 columns
2 rows are broken into 10 columns
1 rows are broken into 8 columns
1 rows are broken into 63 columns
1 rows are broken into 43 columns
1 rows are broken into 23 columns
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment