Skip to content

Instantly share code, notes, and snippets.

@ralfbecher
Created September 11, 2013 21:42
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ralfbecher/6530180 to your computer and use it in GitHub Desktop.
Save ralfbecher/6530180 to your computer and use it in GitHub Desktop.
Load data from text or CSV file with broken format where a field contains CR/LF.
/* Example file 'broken.txt' with a broken line, field containing CR/LF:
Field1|Fiel2|Field3
Abcd|efg|hijk
Lm|no
pq|vwz
123|456|7890
*/
Set vDelimiter = '|';
Set vNoOfColumns = 3;
temp:
LOAD
recno() as rec_no,
if(SubStringCount(@1:n, '$(vDelimiter)')<$(vNoOfColumns)-1 AND SubStringCount(previous(@1:n),
'$(vDelimiter)')<$(vNoOfColumns)-1, peek(group_no), RangeSum(peek(group_no),1)) as group_no,
@1:n as raw_line
FROM broken.txt
(fix, codepage is 1252, header is 1 line);
raw_data:
LOAD concat(raw_line, '', rec_no) as raw_line
Resident temp
group by group_no;
result:
LOAD subfield(raw_line,'$(vDelimiter)',1) as Field1,
subfield(raw_line,'$(vDelimiter)',2) as Field2,
subfield(raw_line,'$(vDelimiter)',3) as Field3
Resident raw_data;
Drop Tables temp, raw_data;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment