(
`date` Date,
`project` LowCardinality(String),
`hits` UInt32
)
ENGINE = SummingMergeTree
ORDER BY (date, project, hits);
should be ORDER BY (date, project)
otherwise SummingMergeTree will SUM nothing!!!!
In general (oversimplified explanation) ORDERBY of MatView storage table should match GROUPBY of mat.view which is GROUP BY date, project
in this case.
Getting materialized view size on disk
better use this query. It less confusing (about active and inactive parts) and simpler
SELECT
rows,
formatReadableSize(total_bytes) AS total_bytes
FROM system.tables
WHERE table = 'wikistat_top_projects'
We use FINAL modifier to make sure the summing engine returns summarized hits instead of individual, unmerged rows:
SELECT hits
FROM wikistat_top_projects
FINAL
WHERE (project = 'test') AND (date = date(now()))
Counterproductive example with FINAL in this case. FINAL will read excessively all columns of primary key. Users should always use SUM/GROUPBY if they query SummingMergeTree. Better to never expose bad practice in the examples.
Speed up aggregations using materialized views
Such matView (min/max/avg over (sum)) are impossible in Clickhouse. Because Aggregation works against inserted data and results will be incorrect if you ingest data using insert into source table. It "works" in your example because you populated data using INSERT SELECT query, it will not work in MatView
CODEC has no sense for ENGINE = Null, you just confuse Clickhouse users. Remove CODEC from DDL.
CREATE TABLE wikistat_src
(
`time` DateTime CODEC(Delta(4), ZSTD(1)),
`project` LowCardinality(String),
`subproject` LowCardinality(String),
`path` String,
`hits` UInt64
)
ENGINE = Null
or use this DDL CREATE TABLE wikistat_src as wikistat ENGINE = Null
This is because a materialized view only triggers when its source table receives inserts. It's just a trigger on the source table and knows nothing about the join table. In our case, wikistat is the source table for the materialized view, and wikistat_titles is a table we join to:
The SOURCE table is the FIRST table in MAT_VIEW's select FROM section, it's not about JOIN, the same will be with RIGHT JOIN or with IN SELECT or ALL SELECT ...