Skip to content

Instantly share code, notes, and snippets.

Created January 22, 2023 14:55
What would you like to do?
Using Materialized Views in ClickHouse
    `date` Date,
    `project` LowCardinality(String),
    `hits` UInt32
ENGINE = SummingMergeTree
ORDER BY (date, project, hits);

should be ORDER BY (date, project) otherwise SummingMergeTree will SUM nothing!!!!

In general (oversimplified explanation) ORDERBY of MatView storage table should match GROUPBY of mat.view which is GROUP BY date, project in this case.

Getting materialized view size on disk

better use this query. It less confusing (about active and inactive parts) and simpler

    formatReadableSize(total_bytes) AS total_bytes
FROM system.tables
WHERE table = 'wikistat_top_projects'

We use FINAL modifier to make sure the summing engine returns summarized hits instead of individual, unmerged rows:

FROM wikistat_top_projects
WHERE (project = 'test') AND (date = date(now()))

Counterproductive example with FINAL in this case. FINAL will read excessively all columns of primary key. Users should always use SUM/GROUPBY if they query SummingMergeTree. Better to never expose bad practice in the examples.

Speed up aggregations using materialized views

Such matView (min/max/avg over (sum)) are impossible in Clickhouse. Because Aggregation works against inserted data and results will be incorrect if you ingest data using insert into source table. It "works" in your example because you populated data using INSERT SELECT query, it will not work in MatView

CODEC has no sense for ENGINE = Null, you just confuse Clickhouse users. Remove CODEC from DDL.

CREATE TABLE wikistat_src
    `time` DateTime CODEC(Delta(4), ZSTD(1)),
    `project` LowCardinality(String),
    `subproject` LowCardinality(String),
    `path` String,
    `hits` UInt64

or use this DDL CREATE TABLE wikistat_src as wikistat ENGINE = Null

This is because a materialized view only triggers when its source table receives inserts. It's just a trigger on the source table and knows nothing about the join table. In our case, wikistat is the source table for the materialized view, and wikistat_titles is a table we join to:

The SOURCE table is the FIRST table in MAT_VIEW's select FROM section, it's not about JOIN, the same will be with RIGHT JOIN or with IN SELECT or ALL SELECT ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment