Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?

Apache Hive is a project which provides SQL dsl which is HiveQL on top of map-reduce in hadoop ecosystem. Mapper(s) and reducer(s) are produced by hive according to given SQL. It is an alternative to Apache Pig.

There are too many built-in functions in Hive. But sometimes we need to have our custom functions. This custom functions are called as UDF which is user defined functions.

UDFs can be written in any language which can be built as jar. For example, if it is in clojure, it needs to be built as jar at the end.

After we generate our jar file contains UDF code, we need to send it to hive auxiliary library folder. This folder is defined as a folder which contains extra libraries for hive. Hive validates and load them and also informs Hadoop-MapReduce(Yarn) about the libraries to make them loaded. Because, our UDF code is actually invoked in map-reduce job, not by hive.

Lets give UDF example in JAVA first and use it next;

package dal.ahmetdal.hive.udf.lettercount;

import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;

public final class LetterCounter extends UDF {

    public Integer evaluate(final Text input) {
        if (input == null) return null;
        return input.toString().length();
    }

}

This is simply returns given text size. Full code is in hive module which is in the repo [blog-examples](https://github.com/javrasya/blog-examples" target="_blank). I build it with maven.

After build it, move the jar files under hive auxiliary folder. Before we do that, we need to configure hive about auxiliary folder location.

# where is hive located
$ cd $HIVE_HOME
$ mkdir auxlib

Then, edit $HIVE_HOME/conf/hive-site-xml and add a property between configuration tags;

<property>
  <name>hive.aux.jars.path</name>
  <value>file:/$HIVE_HOME/auxlib/</value>
</property>

Note: Do not forget to change $HIVE_HOME to your hive location in the scripts.

Now,we have auxiliary folder and we can move our built jar into that folder.

$ cd $LETTER_COUNTER_HOME
$ mvn clean install
$ mv target/letter-counter.jar $HIVE_HOME/auxlib/

Our UDFs is not registered in hive shell. There are two types of registration of functions. One is temporary other is permanent. Temporary functions can't be used via hiveserver2 . If the functions used via jdbc through hiverserver2, it must be registered as permanently. One last thing should be mentioned about permanent function. The database instance where you created the function in, is important. Other instances can't access the function.

$ $HIVE_HOME/bin/hive

To create temporary function;

hive>  CREATE TEMPORARY FUNCTION count_letters AS 'dal.ahmetdal.hive.udf.lettercount.LetterCounter';

To create permanent function;

hive>  CREATE FUNCTION count_letters AS 'dal.ahmetdal.hive.udf.lettercount.LetterCounter';

We are now able to use the UDF in our queries like;

hive> select count_letters('name') from my_database .....
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment