Skip to content

Instantly share code, notes, and snippets.

@airawat
Last active November 25, 2020 07:40
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save airawat/7310050 to your computer and use it in GitHub Desktop.
Save airawat/7310050 to your computer and use it in GitHub Desktop.
Cusom Hive Eval UDF NVL2
This gist covers a simple Hive eval UDF in Java, that mimics NVL2 functionality in Oracle.
NVL2 is used to handle nulls and conditionally substitute values.
Included:
1. Input data
2. Expected results
3. UDF code in java
4. Hive query to demo the UDF
5. Output
Note: The dataset is very small - as the purpose of this gist is instructional. :)
About the NVL2 functionality in this demo:
------------------------------------------
Return type: String
Parameters: Three comma separated strings, we will refer to as:
expr1, expr2, expr3
Purpose: If expr1 is null, NVL2 returns expr3, otherwise, expr3
**************************
Input data
**************************
1. Execute locally on the node you are running Hive client from
-----------------------------------------------------------------
Create input file/data to use for the demo.
Since this gist is merely for instructional purpose, the dataset is small.
cd ~
mkdir hiveProject
cd hiveProject
vi Departments_UDFTest
Paste this..ensuring the fields are delimited by tabs and record with new line.
d001 Marketing
d002 Finance
d003 Human Resources
d004 Production
d005 Development
d006 Quality Management
d007 Sales
d008
d009 Customer Service
2. Hadoop commands
--------------------
hadoop fs -mkdir hiveProject
hadoop fs -put hiveProject/Departments_UDFTest hiveProject
*********************************************
Setting up the Hive table
*********************************************
In hive shell....
a) Create table:
CREATE EXTERNAL TABLE IF NOT EXISTS departments_UDFTest
(
deptNo String,
deptName String
)
Row format delimited
fields terminated by '\t'
LOCATION '/user/akhanolk/hiveProject';
b) Quick test:
Select * from departments_UDFTest;
d001 Marketing
d002 Finance
d003 Human Resources
d004 Production
d005 Development
d006 Quality Management
d007 Sales
d008 NULL
d009 Customer Service
//------------------------------------------------------------------
// Filename: NVL2.java
//------------------------------------------------------------------
package khanolkar.HiveUDFs;
import java.io.IOException;
import org.apache.hadoop.hive.ql.exec.UDF;
public class NVL2 extends UDF {
String expr1, expr2, expr3;
public NVL2() {
}
public String evaluate(String pExpr1, String pExpr2, String pExpr3)
throws IOException {
try {
expr1 = (String) pExpr1;
expr2 = (String) pExpr2;
expr3 = (String) pExpr3;
return (expr1 != null ? expr2 : expr3);
} catch (Exception e) {
// Cause task failure
throw new IOException("Error with Hive UDF, NVL2!", e);
}
}
}
****************************
Expected results
****************************
Query:
select deptNo,NVL2(deptName,deptName,'Procrastrination') from departments_UDFTest;
The null in the department name for department d008, should be returned as "Procrastrination".
For the rest of the records, the query should return the data in Hive, as is.
*******************************
Testing the UDF
*******************************
hive> add jar hiveProject/jars/NVL2.jar;
hive> CREATE TEMPORARY FUNCTION NVL2
AS 'khanolkar.HiveUDFs.NVL2';
hive> select deptNo,NVL2(deptName,deptName) from departments_UDFTest;
FAILED: SemanticException [Error 10014]: Line 1:14 Wrong arguments 'deptName': No matching method for class khanolkar.HiveUDFs.NVL2 with (string, string). Possible choices: _FUNC_(string, string, string)
hive> select deptNo,NVL2(deptName,deptName,'Procrastrination') from departments_UDFTest;
OK
d001 Marketing
d002 Finance
d003 Human Resources
d004 Production
d005 Development
d006 Quality Management
d007 Sales
d008 Procrastrination
d009 Customer Service
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment