Skip to content

Instantly share code, notes, and snippets.

@BirgittaHauser
Created January 30, 2020 05:18
Show Gist options
  • Save BirgittaHauser/e5872aba85764b8738eac3adfb0e0279 to your computer and use it in GitHub Desktop.
Save BirgittaHauser/e5872aba85764b8738eac3adfb0e0279 to your computer and use it in GitHub Desktop.
Read *.csv File directly with SQL
-- Read *csv File from IFS
With x as (-- Split IFS File into Rows (at CRLF)
Select Ordinal_Position as RowKey, Element as RowInfo
from Table(SysTools.Split(Get_Clob_From_File('/home/Hauser/Employee.csv'), x'0D25')) a
Where Trim(Element) > ''),
y as (-- Split IFS File Rows into Columns (and remove leading/trailing double quotes ")
Select x.*, Ordinal_Position ColKey,
Trim(B '"' from Element) as ColInfo
from x cross join Table(SysTools.Split(RowInfo, ',')) a)
-- Return the Result as Table
Select RowKey,
Min(Case When ColKey = 1 Then ColInfo End) EmployeeNo,
Min(Case When ColKey = 2 Then ColInfo End) Name,
Min(Case When ColKey = 3 Then ColInfo End) FirstName,
Min(Case When ColKey = 4 Then ColInfo End) Address,
Min(Case When ColKey = 5 Then ColInfo End) Country,
Min(Case When ColKey = 6 Then ColInfo End) ZipCode,
Min(Case When ColKey = 7 Then ColInfo End) City
From y
Where RowKey > 1 -- Remove header
Group By RowKey;
@BirgittaHauser
Copy link
Author

The samples *.csv file consists of 7 columns (EmployeeNo - Numeric, Name, FirstName, Address, Country, ZipCode, City), separated by a comma. Character values are embedded in double qoutes.
It also includes a column header row, which is removed within the final SELECT statement.

@NickLitten
Copy link

That's magic! Now I'm going to find an excuse to use it 😁

@RedHairWoW
Copy link

Clever!

@AlexKrashevsky
Copy link

Found a system where qsystools.split() is available and took a shoot at it using the published code sample for template and a not too big csv (100 rows, 60+ columns). Unfortunately, the tool runs extremely slow. The RPG function that I mentioned back on LinkedIn performs a bit better :)

@dancarlosgabriel
Copy link

and concised! thanks Birgitta for sharing.

@BirgittaHauser
Copy link
Author

BirgittaHauser commented Feb 1, 2020 via email

@AlexKrashevsky
Copy link

NickLitten just raise your hand, the excuses will line up to find you :)
Every shop has tons of spreadsheets, and we programmers incessantly run the venerable cpyfrmimpf to satisfy the business needs.
I would point out the power of SQL that the tool makes applicable to the analysis of raw user data plus the ability to implement a configurable field mapping and casting as the main reasons to consider this tool.

@AlexKrashevsky
Copy link

AlexKrashevsky commented Feb 2, 2020

These statements run in ACS's Run SQL will generate a table function offering an easy to use generic interface to run Birgitta's `code.

CREATE FUNCTION PARSECSV (
@path CHAR(256) , @startwith INTEGER )
RETURNS TABLE (
ROWKEY INTEGER ,
A CHAR(50) ,
B CHAR(50) ,
C CHAR(50) ,
D CHAR(50) ,
E CHAR(50) ,
F CHAR(50) ,
G CHAR(50) ,
H CHAR(50) ,
I CHAR(50) ,
J CHAR(50) ,
K CHAR(50) ,
L CHAR(50) ,
M CHAR(50) ,
N CHAR(50) ,
O CHAR(50) ,
P CHAR(50) ,
Q CHAR(50) ,
R CHAR(50) ,
S CHAR(50) ,
T CHAR(50) ,
U CHAR(50) ,
V CHAR(50) ,
W CHAR(50) ,
X CHAR(50) ,
Y CHAR(50) ,
Z CHAR(50) ,
AA CHAR(50) ,
AB CHAR(50) ,
AC CHAR(50) ,
AD CHAR(50) ,
AE CHAR(50) ,
AF CHAR(50) ,
AG CHAR(50) ,
AH CHAR(50) ,
AI CHAR(50) ,
AJ CHAR(50) ,
AK CHAR(50) ,
AL CHAR(50) ,
AM CHAR(50) ,
AN CHAR(50) ,
AO CHAR(50) ,
AP CHAR(50) ,
AQ CHAR(50) ,
AR CHAR(50) ,
"AS" CHAR(50) ,
"AT" CHAR(50) ,
AU CHAR(50) ,
AV CHAR(50) ,
AW CHAR(50) ,
AX CHAR(50) ,
AY CHAR(50) ,
AZ CHAR(50) ,
BA CHAR(50) ,
BB CHAR(50) ,
BC CHAR(50) ,
BD CHAR(50) ,
BE CHAR(50) ,
BF CHAR(50) ,
BG CHAR(50) ,
BH CHAR(50) ,
BI CHAR(50) ,
BJ CHAR(50) ,
BK CHAR(50) ,
BL CHAR(50) ,
BM CHAR(50) ,
BN CHAR(50) ,
BO CHAR(50) )
LANGUAGE SQL
SPECIFIC PARSECSV
NOT DETERMINISTIC
MODIFIES SQL DATA
CALLED ON NULL INPUT
CONCURRENT ACCESS RESOLUTION DEFAULT
SET OPTION ALWBLK = *ALLREAD ,
ALWCPYDTA = *OPTIMIZE ,
COMMIT = *CS ,
DECRESULT = (31, 31, 00) ,
DYNUSRPRF = *USER ,
SRTSEQ = *HEX
RETURN
WITH X AS ( -- Split IFS File into Rows (at CRLF)
SELECT ORDINAL_POSITION AS ROWKEY , ELEMENT AS ROWINFO
FROM TABLE ( SYSTOOLS . SPLIT ( GET_CLOB_FROM_FILE ( trim(@path) ) , X'0D25' ) ) A
WHERE TRIM ( ELEMENT ) > '' )
,
-- Split IFS File Rows into Columns (and remove leading/trailing double quotes ")
Y AS (
SELECT X . * , ORDINAL_POSITION COLKEY , TRIM ( B '"' FROM ELEMENT ) AS COLINFO
FROM X CROSS
JOIN TABLE ( SYSTOOLS . SPLIT ( ROWINFO , ',' ) ) A
-- ;
)
SELECT ROWKEY , MIN ( CASE WHEN COLKEY = 1 THEN COLINFO END ) A , MIN ( CASE WHEN COLKEY = 2
THEN COLINFO END ) B , MIN ( CASE WHEN COLKEY = 3 THEN COLINFO END ) C , MIN ( CASE
WHEN COLKEY = 4 THEN COLINFO END ) D , MIN ( CASE WHEN COLKEY = 5 THEN COLINFO END )
E , MIN ( CASE WHEN COLKEY = 6 THEN COLINFO END ) F , MIN ( CASE WHEN COLKEY = 7 THEN
COLINFO END ) G , MIN ( CASE WHEN COLKEY = 8 THEN COLINFO END ) H , MIN ( CASE WHEN
COLKEY = 9 THEN COLINFO END ) I , MIN ( CASE WHEN COLKEY = 10 THEN COLINFO END ) J ,
MIN ( CASE WHEN COLKEY = 11 THEN COLINFO END ) K , MIN ( CASE WHEN COLKEY = 12 THEN
COLINFO END ) L , MIN ( CASE WHEN COLKEY = 13 THEN COLINFO END ) M , MIN ( CASE WHEN
COLKEY = 14 THEN COLINFO END ) N , MIN ( CASE WHEN COLKEY = 15 THEN COLINFO END ) O ,
MIN ( CASE WHEN COLKEY = 16 THEN COLINFO END ) P , MIN ( CASE WHEN COLKEY = 17 THEN
COLINFO END ) Q , MIN ( CASE WHEN COLKEY = 18 THEN COLINFO END ) R , MIN ( CASE WHEN
COLKEY = 19 THEN COLINFO END ) S , MIN ( CASE WHEN COLKEY = 20 THEN COLINFO END ) T ,
MIN ( CASE WHEN COLKEY = 21 THEN COLINFO END ) U , MIN ( CASE WHEN COLKEY = 22 THEN
COLINFO END ) V , MIN ( CASE WHEN COLKEY = 23 THEN COLINFO END ) W , MIN ( CASE WHEN
COLKEY = 24 THEN COLINFO END ) X , MIN ( CASE WHEN COLKEY = 25 THEN COLINFO END )
Y , MIN ( CASE WHEN COLKEY = 26 THEN COLINFO END ) Z , MIN ( CASE WHEN COLKEY
= 27 THEN COLINFO END ) AA , MIN ( CASE WHEN COLKEY = 28 THEN COLINFO END )
AB , MIN ( CASE WHEN COLKEY = 29 THEN COLINFO END ) AC , MIN ( CASE WHEN COLKEY = 30
THEN COLINFO END ) AD , MIN ( CASE WHEN COLKEY = 31 THEN COLINFO END ) AE , MIN ( CASE
WHEN COLKEY = 32 THEN COLINFO END ) AF , MIN ( CASE WHEN COLKEY = 33 THEN COLINFO END )
AG , MIN ( CASE WHEN COLKEY = 34 THEN COLINFO END ) AH , MIN ( CASE WHEN COLKEY = 35
THEN COLINFO END ) AI , MIN ( CASE WHEN COLKEY = 36 THEN COLINFO END ) AJ , MIN ( CASE
WHEN COLKEY = 37 THEN COLINFO END ) AK , MIN ( CASE WHEN COLKEY = 38 THEN COLINFO END )
AL , MIN ( CASE WHEN COLKEY = 39 THEN COLINFO END ) AM , MIN ( CASE WHEN COLKEY = 40
THEN COLINFO END ) AN , MIN ( CASE WHEN COLKEY = 41 THEN COLINFO END ) AO , MIN ( CASE
WHEN COLKEY = 42 THEN COLINFO END ) AP , MIN ( CASE WHEN COLKEY = 43 THEN COLINFO
END ) AQ , MIN ( CASE WHEN COLKEY = 44 THEN COLINFO END ) AR , MIN ( CASE WHEN
COLKEY = 45 THEN COLINFO END ) AS "AS", MIN ( CASE WHEN COLKEY = 46 THEN COLINFO
END ) AT , MIN ( CASE WHEN COLKEY = 47 THEN COLINFO END ) AU , MIN ( CASE WHEN COLKEY
= 48 THEN COLINFO END ) AV , MIN ( CASE WHEN COLKEY = 49 THEN COLINFO END ) AW ,
MIN ( CASE WHEN COLKEY = 50 THEN COLINFO END ) AX , MIN ( CASE WHEN COLKEY = 51 THEN
COLINFO END ) AY , MIN ( CASE WHEN COLKEY = 52 THEN COLINFO END ) AZ ,
MIN ( CASE WHEN COLKEY = 53 THEN COLINFO END ) BA , MIN ( CASE WHEN COLKEY = 54 THEN
COLINFO END ) BB , MIN ( CASE WHEN COLKEY = 55 THEN COLINFO END ) BC , MIN ( CASE
WHEN COLKEY = 56 THEN COLINFO END ) BD , MIN ( CASE WHEN COLKEY = 57 THEN COLINFO
END ) BE , MIN ( CASE WHEN COLKEY = 58 THEN COLINFO END ) BF , MIN ( CASE WHEN
COLKEY = 59 THEN COLINFO END ) BG , MIN ( CASE WHEN COLKEY = 60 THEN COLINFO END )
BH , MIN ( CASE WHEN COLKEY = 61 THEN COLINFO END ) BI , MIN ( CASE WHEN
COLKEY = 62 THEN COLINFO END ) BJ , MIN ( CASE WHEN COLKEY = 63 THEN COLINFO END )
BK , MIN ( CASE WHEN COLKEY = 64 THEN COLINFO END ) BL , MIN ( CASE WHEN COLKEY
= 65 THEN COLINFO END ) BM , MIN ( CASE WHEN COLKEY = 66 THEN COLINFO END ) BN ,
MIN ( CASE WHEN COLKEY = 67 THEN COLINFO END ) BO
FROM Y
WHERE ROWKEY > (@startwith - 1) -- Remove header if necessary
GROUP BY ROWKEY
ORDER BY ROWKEY
;

LABEL ON SPECIFIC FUNCTION PARSECSV
IS 'CSV Parser' ;

--Grant permissions as necessary
--GRANT ALTER , EXECUTE
--ON SPECIFIC FUNCTION PARSECSV
--TO WITH GRANT OPTION ;

For example, the function could be run as

select * from table(parsecsv('/somefolder/somefile.csv',1)) t;

Or, if you don't want to have the header row returned, substitute the second parameter with a 2.

@BirgittaHauser
Copy link
Author

BirgittaHauser commented Feb 2, 2020 via email

@AlexKrashevsky
Copy link

AlexKrashevsky commented Feb 2, 2020

Thanks Birgitta. I think Modifies SQL Data would also be required here. Based on the ACS wizard, the default behavior would be Reads SQL Data. Without the Modifies, the system will return SQL0577 for this routine.

Also, just a matter of taste, I would name the columns after their Excel counterparts (i.e., A,B,C,...AA,AB,AC,...) to more easily map them out to the origins.

@AlexKrashevsky
Copy link

One needs to be cognizant about certain size limitation split() currently imposes.

Looking at the parameter definition, INPUT_LIST is defined as CLOB(1048576). I have a (relatively) large csv file (40 columns, just under 30K rows, and the size stands at just under 7M). The first CTE from the new UDTF parsecsv() will cut the processing short when run against that file.

The size of the input parm for split() could probably be bumped up to help the size limitation issue, but I would be concerned about the performance impact. The performance of the new UDTF parsecsv() is not too great already.

It seems to me that, while very cool, this routine could only be deployed to run against relatively small files. Birgitta, would you agree, or am I missing anything?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment