Skip to content

Instantly share code, notes, and snippets.

@filimonov
Last active February 8, 2022 12:45
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save filimonov/a4f6754497f02fcef78e9f23a4d170ee to your computer and use it in GitHub Desktop.
Save filimonov/a4f6754497f02fcef78e9f23a4d170ee to your computer and use it in GitHub Desktop.

SimpleAggregateFunction function can store only values of the base type. If you use DateTime - it can store internally only DateTime. Nothing like a special marker 'lack of value' etc. So if fo example you feed it with minIf, and -If found nothing - the SimpleAggregateFunction will store the default value for the DateTime type. Which is zero / or 1970-01-01 00:00:00 utc.

DROP TABLE IF EXISTS test_state_issue;

CREATE TABLE test_state_issue engine=AggregatingMergeTree ORDER BY id AS SELECT 1 as id, minSimpleStateIf(toDateTime('2001-01-01 01:01:01') + INTERVAL number YEAR, number > 10) as x from numbers(5);

SHOW CREATE TABLE test_state_issue;

/*
CREATE TABLE default.test_state_issue
(
    `id` UInt8,
    `x` SimpleAggregateFunction(min, DateTime) --  that can't store anything except of DateTime
)
ENGINE = AggregatingMergeTree
ORDER BY id
SETTINGS index_granularity = 8192
*/

SELECT *
FROM test_state_issue

/*
┌─id─┬───────────────────x─┐
│  1 │ 1970-01-01 01:00:00 │
└────┴─────────────────────┘
*/

And now even of other inserts will bring the actual data

insert into test_state_issue SELECT 1 as id, minSimpleStateIf(toDateTime('2001-01-01 01:01:01') + INTERVAL number YEAR, number > 10) as x from numbers(5);

SELECT * FROM test_state_issue

┌─id─┬───────────────────x─┐
│  11970-01-01 01:00:00 │
└────┴─────────────────────┘
┌─id─┬───────────────────x─┐
│  12012-01-01 01:01:01 │
└────┴─────────────────────┘

The minimum logic will pick the 'bad' row (becase it doesn't know that 1970 is not a real value, but it obviously smaller than 2012)

SELECT * FROM test_state_issue FINAL

┌─id─┬───────────────────x─┐
│  11970-01-01 01:00:00<- 1970 is less than 2012 :\ 
└────┴─────────────────────┘

If you need to avoid that you need a way to store 'empty' value as something different from DateTime

DROP TABLE IF EXISTS test_state_issue;

CREATE TABLE default.test_state_issue
(
    `id` UInt8,
    `x` SimpleAggregateFunction(min, Nullable(DateTime))
)
ENGINE = AggregatingMergeTree
ORDER BY id
SETTINGS index_granularity = 8192

Unfortunatelly in that case you need to create table explicitly due to ClickHouse/ClickHouse#34407

insert into test_state_issue SELECT 1 as id, minSimpleStateIf(toNullable(toDateTime('2001-01-01 01:01:01') + INTERVAL number YEAR), number > 10) as x from numbers(5);
insert into test_state_issue SELECT 1 as id, minSimpleStateIf(toNullable(toDateTime('2001-01-01 01:01:01') + INTERVAL number YEAR), number > 10) as x from numbers(15);

SELECT * FROM test_state_issue

/*
┌─id─┬───────────────────x─┐
│  1 │ 2012-01-01 01:01:01 │
└────┴─────────────────────┘
┌─id─┬────x─┐
│  1 │ ᴺᵁᴸᴸ │
└────┴──────┘
*/ 

SELECT * FROM test_state_issue FINAL
/*
┌─id─┬───────────────────x─┐
│  1 │ 2012-01-01 01:01:01 │ <-- now it's ok 
└────┴─────────────────────┘
*/

The other alternative is using full aggreate function states in that case lack of value differs from the default value

DROP TABLE IF EXISTS test_state_issue;

CREATE TABLE test_state_issue engine=AggregatingMergeTree ORDER BY id AS SELECT 1 as id, minStateIf(toDateTime('2001-01-01 01:01:01') + INTERVAL number YEAR, number > 10) as x from numbers(5);

SHOW CREATE TABLE test_state_issue;

insert into test_state_issue SELECT 1 as id, minStateIf(toDateTime('2001-01-01 01:01:01') + INTERVAL number YEAR, number > 10) as x from numbers(5);

SELECT
    id,
    finalizeAggregation(x)
FROM default.test_state_issue

Query id: a323d914-d283-42c8-ad13-3699599e27cc

┌─id─┬─finalizeAggregation(x)─┐
│  12012-01-01 01:01:01 │
└────┴────────────────────────┘
┌─id─┬─finalizeAggregation(x)─┐
│  11970-01-01 01:00:00 │
└────┴────────────────────────┘

SELECT
    id,
    finalizeAggregation(x)
FROM default.test_state_issue
FINAL

Query id: 4328450c-c721-4291-bea4-9a03efdfa3b6

┌─id─┬─finalizeAggregation(x)─┐
│  12012-01-01 01:01:01<-- it's ok
└────┴────────────────────────┘
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment