Skip to content

Instantly share code, notes, and snippets.

@airawat
Last active December 27, 2015 03:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save airawat/7260681 to your computer and use it in GitHub Desktop.
Save airawat/7260681 to your computer and use it in GitHub Desktop.
Custom Pig UDF NVL2
This gist covers a simple Pig eval UDF in Java, that mimics NVL2 functionality in Oracle.
Included:
1. Input data
2. UDF code in java
3. Pig script to demo the UDF
4. Expected result
5. Command to execute script
6. Output
package khanolkar.pigUDFs;
import java.io.IOException;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;
// Custom UDF
// Name: NVL2
// Parameters: Tuple with three Strings
// Purpose: Facilitates handling nulls + replacing non-null values
// If the first parameter is null, returns the third parameter,
// otherwise returns the second parameter
// E.g. NVL2(null,"Busy bee","Sloth") = "Sloth"
// E.g. NVL2("Anagha","Busy bee","Sloth") = "Busy bee"
// Returns: Null if tuple is empty
// Null if the three input parameters are not in the tuple
// Otherwise, Result of applying NVL2 logic
public class NVL2 extends EvalFunc<String> {
public String exec(Tuple input) throws IOException {
if (input == null || input.size() == 0)
return null;
try {
if (input.size() == 3) {
String expr1 = (String) input.get(0);
String expr2 = (String) input.get(1);
String expr3 = (String) input.get(2);
return (expr1 != null ? expr2 : expr3);
} else {
return null;
}
} catch (Exception e) {
// Cause task failure
throw new IOException("Error with UDF, NVL2!", e);
}
}
}
#--------------------------------------------------------------------------------------
# Pig Script
# NVL2UDFDemo.pig
#--------------------------------------------------------------------------------------
register NVL2.jar;
define NVL2 khanolkar.pigUDFs.NVL2;
rawDS = load 'departments' using PigStorage() as (deptNo:chararray, deptName:chararray);
transformedDS = foreach rawDS generate $0, NVL2($1,$1,'Procrastination');
dump transformedDS;
#---------------------------
# Input data
#---------------------------
d001 Marketing
d002 Finance
d003 Human Resources
d004 Production
d005 Development
d006 Quality Management
d007 Sales
d008
d009 Customer Service
.................
#---------------------------
# Directory structure
#---------------------------
pigProject
evalFunc
NVL2
departments
NVL2.jar
NVL2UDFDemo.pig
#----------------------------------------------------------
# Load script and data to HDFS
#----------------------------------------------------------
$ hadoop fs -mkdir pigProject
$ hadoop fs -mkdir pigProject/evalFunc
$ hadoop fs -put pigProject/evalFunc/* pigProject/evalFunc
#---------------------------
# Command to test
#---------------------------
On the cluster
$ pig pigProject/evalFunc/NVL2/NVL2UDFDemo.pig
Locally
$ pig -x local pigProject/evalFunc/NVL2/NVL2UDFDemo.pig
#---------------------------
# Output data
#---------------------------
(d001,Marketing)
(d002,Finance)
(d003,Human Resources)
(d004,Production)
(d005,Development)
(d006,Quality Management)
(d007,Sales)
(d008,Procrastination)
(d009,Customer Service)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment