Skip to content

Instantly share code, notes, and snippets.

@libc
Last active August 29, 2015 14:00
Show Gist options
  • Save libc/11373092 to your computer and use it in GitHub Desktop.
Save libc/11373092 to your computer and use it in GitHub Desktop.
sane_from_unixtime

sane_from_unixtime for hive

Keywords: from_unixtime week starting on Monday, from_unixtime sane week numbers

Why?

from_unixtime uses insane week numbers by default. This UDF is a fusion between yearweek and from_unixtime.

Just look at the following example:

select date, from_unixtime(unix_timestamp(date, "yyyy-MM-dd"), "YYYY-ww"), sane_from_unixtime(unix_timestamp(date, "yyyy-MM-dd"), "YYYY-ww") from dates where date > '2013-12-28' order by date;

2013-12-29	2014-01	2013-52
2013-12-30	2014-01	2014-01
2013-12-31	2014-01	2014-01
2014-01-01	2014-01	2014-01
2014-01-02	2014-01	2014-01
2014-12-30	2015-01	2015-01
2015-12-30	2016-01	2015-53
2016-12-30	2016-53	2016-52

How to use?

Run `sbt package' wait about an hour wilst it compiles everything, the production would be in target/scala-2.11/hive-sane-from-unixtime_2.11-0.0.1.jar

You should copy that file to the server and run in hive console

add jar /home/hadoop/hive-sane-from-unixtime_2.11-0.0.1.jar;
create temporary function sane_from_unixtime as 'com.wakoopa.hadoop.hive.ql.udf.UDFSaneFromUnixTime';

How this came to be?

It's a mix of the following:

  1. http://snowplowanalytics.com/blog/2013/02/08/writing-hive-udfs-and-serdes/
  2. https://github.com/apache/hive/blob/d87d6cdefdfedd29a1920c1e6c16fa242f8fcb42/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFFromUnixTime.java
  3. https://github.com/apache/hive/blob/d87d6cdefdfedd29a1920c1e6c16fa242f8fcb42/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFWeekOfYear.java
name := "hive-sane-from-unixtime"
version := "0.0.1"
organization := "com.wakoopa"
scalaVersion := "2.11.0"
scalacOptions ++= Seq("-unchecked", "-deprecation")
resolvers += "CDH4" at "https://repository.cloudera.com/artifactory/cloudera-repos/"
libraryDependencies += "org.apache.hadoop" % "hadoop-core" % "1.0.3" % "provided"
libraryDependencies += "org.apache.hive" % "hive-exec" % "0.11.0" % "provided"
/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package com.wakoopa.hadoop.hive.ql.udf;
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.Date;
import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
/**
* UDFFromUnixTime.
*
*/
@Description(name = "sane_from_unixtime",
value = "_FUNC_(unix_time, format) - returns unix_time in the specified format",
extended = "Example:\n"
+ " > SELECT _FUNC_(0, 'yyyy-MM-dd HH:mm:ss') FROM src LIMIT 1;\n"
+ " '1970-01-01 00:00:00'")
public class UDFSaneFromUnixTime extends UDF {
private SimpleDateFormat formatter;
private final Calendar calendar = Calendar.getInstance();
private Text result = new Text();
private Text lastFormat = new Text();
public UDFSaneFromUnixTime() {
calendar.setFirstDayOfWeek(Calendar.MONDAY);
calendar.setMinimalDaysInFirstWeek(4);
}
private Text defaultFormat = new Text("yyyy-MM-dd HH:mm:ss");
public Text evaluate(IntWritable unixtime) {
return evaluate(unixtime, defaultFormat);
}
/**
* Convert UnixTime to a string format.
*
* @param unixtime
* The number of seconds from 1970-01-01 00:00:00
* @param format
* See
* http://java.sun.com/j2se/1.4.2/docs/api/java/text/SimpleDateFormat
* .html
* @return a String in the format specified.
*/
public Text evaluate(LongWritable unixtime, Text format) {
if (unixtime == null || format == null) {
return null;
}
return eval(unixtime.get(), format);
}
/**
* Convert UnixTime to a string format.
*
* @param unixtime
* The number of seconds from 1970-01-01 00:00:00
* @return a String in default format specified.
*/
public Text evaluate(LongWritable unixtime) {
if (unixtime == null) {
return null;
}
return eval(unixtime.get(), defaultFormat);
}
/**
* Convert UnixTime to a string format.
*
* @param unixtime
* The number of seconds from 1970-01-01 00:00:00
* @param format
* See
* http://java.sun.com/j2se/1.4.2/docs/api/java/text/SimpleDateFormat
* .html
* @return a String in the format specified.
*/
public Text evaluate(IntWritable unixtime, Text format) {
if (unixtime == null || format == null) {
return null;
}
return eval(unixtime.get(), format);
}
/**
* Internal evaluation function given the seconds from 1970-01-01 00:00:00 and
* the output text format.
*
* @param unixtime
* seconds of type long from 1970-01-01 00:00:00
* @param format
* display format. See
* http://java.sun.com/j2se/1.4.2/docs/api/java/text
* /SimpleDateFormat.html
* @return elapsed time in the given format.
*/
private Text eval(long unixtime, Text format) {
if (!format.equals(lastFormat)) {
formatter = new SimpleDateFormat(format.toString());
formatter.setCalendar(calendar);
lastFormat.set(format);
}
// convert seconds to milliseconds
Date date = new Date(unixtime * 1000L);
result.set(formatter.format(date));
return result;
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment