Skip to content

Instantly share code, notes, and snippets.

@BirgittaHauser
Created January 30, 2020 05:18
Show Gist options
  • Save BirgittaHauser/e5872aba85764b8738eac3adfb0e0279 to your computer and use it in GitHub Desktop.
Save BirgittaHauser/e5872aba85764b8738eac3adfb0e0279 to your computer and use it in GitHub Desktop.
Read *.csv File directly with SQL
-- Read *csv File from IFS
With x as (-- Split IFS File into Rows (at CRLF)
Select Ordinal_Position as RowKey, Element as RowInfo
from Table(SysTools.Split(Get_Clob_From_File('/home/Hauser/Employee.csv'), x'0D25')) a
Where Trim(Element) > ''),
y as (-- Split IFS File Rows into Columns (and remove leading/trailing double quotes ")
Select x.*, Ordinal_Position ColKey,
Trim(B '"' from Element) as ColInfo
from x cross join Table(SysTools.Split(RowInfo, ',')) a)
-- Return the Result as Table
Select RowKey,
Min(Case When ColKey = 1 Then ColInfo End) EmployeeNo,
Min(Case When ColKey = 2 Then ColInfo End) Name,
Min(Case When ColKey = 3 Then ColInfo End) FirstName,
Min(Case When ColKey = 4 Then ColInfo End) Address,
Min(Case When ColKey = 5 Then ColInfo End) Country,
Min(Case When ColKey = 6 Then ColInfo End) ZipCode,
Min(Case When ColKey = 7 Then ColInfo End) City
From y
Where RowKey > 1 -- Remove header
Group By RowKey;
@BirgittaHauser
Copy link
Author

BirgittaHauser commented Feb 2, 2020 via email

@AlexKrashevsky
Copy link

AlexKrashevsky commented Feb 2, 2020

Thanks Birgitta. I think Modifies SQL Data would also be required here. Based on the ACS wizard, the default behavior would be Reads SQL Data. Without the Modifies, the system will return SQL0577 for this routine.

Also, just a matter of taste, I would name the columns after their Excel counterparts (i.e., A,B,C,...AA,AB,AC,...) to more easily map them out to the origins.

@AlexKrashevsky
Copy link

One needs to be cognizant about certain size limitation split() currently imposes.

Looking at the parameter definition, INPUT_LIST is defined as CLOB(1048576). I have a (relatively) large csv file (40 columns, just under 30K rows, and the size stands at just under 7M). The first CTE from the new UDTF parsecsv() will cut the processing short when run against that file.

The size of the input parm for split() could probably be bumped up to help the size limitation issue, but I would be concerned about the performance impact. The performance of the new UDTF parsecsv() is not too great already.

It seems to me that, while very cool, this routine could only be deployed to run against relatively small files. Birgitta, would you agree, or am I missing anything?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment