Skip to content

Instantly share code, notes, and snippets.

@den-crane
Created January 22, 2023 14:55
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save den-crane/120682bdaf708ae40ec2b80ce5f2275d to your computer and use it in GitHub Desktop.
Save den-crane/120682bdaf708ae40ec2b80ce5f2275d to your computer and use it in GitHub Desktop.
Using Materialized Views in ClickHouse
(
    `date` Date,
    `project` LowCardinality(String),
    `hits` UInt32
)
ENGINE = SummingMergeTree
ORDER BY (date, project, hits);

should be ORDER BY (date, project) otherwise SummingMergeTree will SUM nothing!!!!

In general (oversimplified explanation) ORDERBY of MatView storage table should match GROUPBY of mat.view which is GROUP BY date, project in this case.


Getting materialized view size on disk

better use this query. It less confusing (about active and inactive parts) and simpler

SELECT
    rows,
    formatReadableSize(total_bytes) AS total_bytes
FROM system.tables
WHERE table = 'wikistat_top_projects'

We use FINAL modifier to make sure the summing engine returns summarized hits instead of individual, unmerged rows:

SELECT hits
FROM wikistat_top_projects
FINAL
WHERE (project = 'test') AND (date = date(now()))

Counterproductive example with FINAL in this case. FINAL will read excessively all columns of primary key. Users should always use SUM/GROUPBY if they query SummingMergeTree. Better to never expose bad practice in the examples.


Speed up aggregations using materialized views

Such matView (min/max/avg over (sum)) are impossible in Clickhouse. Because Aggregation works against inserted data and results will be incorrect if you ingest data using insert into source table. It "works" in your example because you populated data using INSERT SELECT query, it will not work in MatView


CODEC has no sense for ENGINE = Null, you just confuse Clickhouse users. Remove CODEC from DDL.

CREATE TABLE wikistat_src
(
    `time` DateTime CODEC(Delta(4), ZSTD(1)),
    `project` LowCardinality(String),
    `subproject` LowCardinality(String),
    `path` String,
    `hits` UInt64
)
ENGINE = Null

or use this DDL CREATE TABLE wikistat_src as wikistat ENGINE = Null


This is because a materialized view only triggers when its source table receives inserts. It's just a trigger on the source table and knows nothing about the join table. In our case, wikistat is the source table for the materialized view, and wikistat_titles is a table we join to:

The SOURCE table is the FIRST table in MAT_VIEW's select FROM section, it's not about JOIN, the same will be with RIGHT JOIN or with IN SELECT or ALL SELECT ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment