Ahmet DAL javrasya

## count_letters.sql
select count_letters('name')

## LetterCounter.java
package dal.ahmetdal.hive.udf.lettercount;

import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;

public final class LetterCounter extends UDF {

    public Integer evaluate(final Text input) {
        if (input == null) return null;
        return input.toString().length();

## create-custom-hive-udf.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                javrasya
                / create-custom-hive-udf.md
            
            
              Last active
              December 27, 2019 15:24
            
          
    Apache Hive is a project which provides SQL dsl which is HiveQL on top of map-reduce in hadoop ecosystem. Mapper(s) and reducer(s) are produced by hive according to given SQL. It is an alternative to Apache Pig.
There are too many built-in functions in Hive. But sometimes we need to have our custom functions. This custom functions are called as   UDF which is user defined functions.
UDFs can be written in any language which can be built as jar. For example, if it is in clojure, it needs to be built as jar at the end.
After we generate our jar file contains UDF code, we need to send it to hive auxiliary library folder. This folder is defined as a folder which contains extra libraries for hive. Hive validates and load them and also informs Hadoop-MapReduce(Yarn) about the libraries to make them loaded. Because, our UDF code is actually invoked in map-reduce job, not by

  
## why-do-we-need-a-dev-branch.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                javrasya
                / why-do-we-need-a-dev-branch.md
            
            
              Last active
              December 27, 2019 15:07
            
          
    I was using master as a dev branch and I tought that dev branch is not necessary. I changed my mind.
I did't need dev branch. Because, my projects didn't need urgent bugfix or hotfix releases. But not getting in any trouble without dev branch doesn't mean everything is always gonna be okay.
####Case
Think about that case; You are on version 0.2.0 and you started to work for version 0.3.0 in master as development branch. On testing or production environment some urgent bugs are rised. This bugs should be fixed as soon as possible. What is gonna happen? Are you going to work for them in master too? Or are you going to work for them in another branch named like bugfix-0.2.1 or hotfix-0.2.1 and then merge it into master? But there were some new features from 0.3.0 which shouldn't be released yet. We can't do it by using master branch as our development branch. Theese kind of releases are called bugfix or hotfix versions.
###Solution

  
## feel-free-to-use-another-module-name.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                javrasya
                / feel-free-to-use-another-module-name.md
            
            
              Created
              December 27, 2019 14:56
            
          
    One of my Python problems was to name modules in my projects. Whenever I had used same module name with built-in one or higher level modules, I was getting in trouble. Because, when I try to import built-in or higher level libraries in my code, only my module was seen and I had import error.
Here is clear explanation; Assume that I have a Project structer like;
* root
	* my_app
    	* logging
        	* __init__.py
 * my_logging_code.py


## select-last-not-null-in-window.sql
SELECT
	`date`,
  COALESCE(
    number,
    LAST_VALUE(number, TRUE) OVER(
      ORDER BY `date`
      ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    )
  ) as last_not_null_number,
	estimate,

## result-example.sql
OK
2016-08-01       3       10      A
2016-08-02       3       10      A
2016-08-03       5       10      A
2016-08-04       5       10      A
2016-08-05       5       10      A
2016-08-06       2       10      A
Time taken: 19.177 seconds, Fetched: 6 row(s)

## dataset-example.sql
hive> select * from my_table;
OK
2016-08-01       3       10      A
2016-08-02       NULL    10      NULL
2016-08-03       5       10      A
2016-08-04       NULL    10      NULL
2016-08-05       NULL    10      NULL
2016-08-06       2       10      A
Time taken: 0.06 seconds, Fetched: 6 row(s)

## my_table.sql
CREATE TABLE IF NOT EXISTS my_table
(
`date` string,
 number int,
 estimate int,
 client string
) row format delimited fields terminated by ',';

## Sliding-Last-Not-Null-Value.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                javrasya
                / Sliding-Last-Not-Null-Value.md
            
            
              Last active
              December 27, 2019 14:46
            
          
    Hive supports windowing features over columns according to current row. You can see the documantation about windowing functions in Hive at LanguageManual WindowingAndAnalytics. Those functions are not going be the topic in this article.
Assume that, you have a dataset contains null value on some columns and you want to fill those null values with most recent not null value of same column. Let me give an example dataset to explain it;
Here is my table;
CREATE TABLE IF NOT EXISTS my_table
(
`date` string,
	package dal.ahmetdal.hive.udf.lettercount;

	import org.apache.hadoop.hive.ql.exec.UDF;
	import org.apache.hadoop.io.Text;

	public final class LetterCounter extends UDF {

	public Integer evaluate(final Text input) {
	if (input == null) return null;
	return input.toString().length();
	SELECT
	`date`,
	COALESCE(
	number,
	LAST_VALUE(number, TRUE) OVER(
	ORDER BY `date`
	ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
	)
	) as last_not_null_number,
	estimate,
	OK
	2016-08-01 3 10 A
	2016-08-02 3 10 A
	2016-08-03 5 10 A
	2016-08-04 5 10 A
	2016-08-05 5 10 A
	2016-08-06 2 10 A
	Time taken: 19.177 seconds, Fetched: 6 row(s)
	hive> select * from my_table;
	OK
	2016-08-01 3 10 A
	2016-08-02 NULL 10 NULL
	2016-08-03 5 10 A
	2016-08-04 NULL 10 NULL
	2016-08-05 NULL 10 NULL
	2016-08-06 2 10 A
	Time taken: 0.06 seconds, Fetched: 6 row(s)
	CREATE TABLE IF NOT EXISTS my_table
	(
	`date` string,
	number int,
	estimate int,
	client string
	) row format delimited fields terminated by ',';