Skip to content

Instantly share code, notes, and snippets.

View feel-free-to-use-another-module-name.md

One of my Python problems was to name modules in my projects. Whenever I had used same module name with built-in one or higher level modules, I was getting in trouble. Because, when I try to import built-in or higher level libraries in my code, only my module was seen and I had import error.

Here is clear explanation; Assume that I have a Project structer like;

* root
	* my_app
    	* logging
        	* __init__.py
 * my_logging_code.py
View select-last-not-null-in-window.sql
SELECT
`date`,
COALESCE(
number,
LAST_VALUE(number, TRUE) OVER(
ORDER BY `date`
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
)
) as last_not_null_number,
estimate,
View result-example.sql
OK
2016-08-01 3 10 A
2016-08-02 3 10 A
2016-08-03 5 10 A
2016-08-04 5 10 A
2016-08-05 5 10 A
2016-08-06 2 10 A
Time taken: 19.177 seconds, Fetched: 6 row(s)
View dataset-example.sql
hive> select * from my_table;
OK
2016-08-01 3 10 A
2016-08-02 NULL 10 NULL
2016-08-03 5 10 A
2016-08-04 NULL 10 NULL
2016-08-05 NULL 10 NULL
2016-08-06 2 10 A
Time taken: 0.06 seconds, Fetched: 6 row(s)
View my_table.sql
CREATE TABLE IF NOT EXISTS my_table
(
`date` string,
number int,
estimate int,
client string
) row format delimited fields terminated by ',';
View Sliding-Last-Not-Null-Value.md

Hive supports windowing features over columns according to current row. You can see the documantation about windowing functions in Hive at LanguageManual WindowingAndAnalytics. Those functions are not going be the topic in this article.

Assume that, you have a dataset contains null value on some columns and you want to fill those null values with most recent not null value of same column. Let me give an example dataset to explain it;

Here is my table;

CREATE TABLE IF NOT EXISTS my_table
(
`date` string,
View student_count_report.sql
use mydatabase;
CREATE TABLE IF NOT EXISTS student_count_report (
school_name string,
student_coun BIGINT
) ROW FORMAT delimited fields terminated BY ;
View student.sql
use mydatabase;
CREATE TABLE IF NOT EXISTS student (
student_id BIGINT,
name string,
lastname string,
birth_date string,
school_id BIGINT
) ROW FORMAT delimited fields terminated BY ;
View school.sql
use mydatabase;
CREATE TABLE IF NOT EXISTS school
(
school_id BIGINT,
school_name string
) ROW FORMAT delimited fields terminated BY ',';
View Hive-Unit-Testing.md

Testing is really important for coding. It provide ensuring the code is working. It is really a way to gain rather it is thought as time lost. This is actually another topic sould be discussed in another article.

Unit-testing is one of most tough thing among coding phases. If you are dealing with data, it is getting harder to do it. Many people give up on that challenging phase. But If you are a good developer - by the way that means, you are addicted to produce good quality of code which can not be broken easily after you first say, I am done - you must love that phase. Becuase, this phase will be the phase that makes you make sure about what you did. I said, this is another topic, but I couldn't keep myself away from talking about it :-) .

As I said, when you are dealing with test needs data, it is hard to prepare those kind of test data which is called as provided data or initial data. (There are some tools to make it easier. I will also talk about it in another article). Actualy, complexity in the ca