Skip to content

Instantly share code, notes, and snippets.


When you a clickhouse cluster and you run queries on all the replicas it's not easy to get all the queries ran. I use system.query_log all the time to check timings, errors and so on.

So what I do is create a global query_log:

:) create view query_log_all on cluster my_cluster as select * from remote(',', 'system.query_log')

So I can inspect queries in all the replicas with a single query:

View link

Clickhouse has a pretty good endpoint /replicas_status which gives information about the, guess what, replication status. When you are working on a cluster in which you use replication to increase the amount of QPS you usually have a load balancer before, something like this:

                                 |              |
                       +-------->+  clickhouse  |
                       |         |              |
                       |         +--------------+

Sometimes you have to move data from one table to a different one. You usually use

insert into target select * from source

This works but have several problems:

  1. materialized columns are not properly copied
  2. it's slow

Looks like filters are pushed down when filtering an "UNION ALL". Also an example on how to use EXPLAIN in clickhouse and a different view of seeing what is going on with the traces, this lines show how much data clickhouse is reading:

Selected 1 parts by date, 1 parts by key, 2 marks by primary key, 2 marks to read from 1 ranges
Reading approx. 16384 rows with 1 streams

The example

javisantana /
Last active Nov 6, 2020
how to generate a map data structure dynamically in clickhouse

Clickhouse has a powerful feature, JOIN engines, that allows to prepare a table to be joined with better performance that a regular table (MergeTree, Log...). It also allows to use joinGet to get table values using a key.

Somtimes you don't have a JOIN table but you'd like to use something with the joinGet performance. Unfortunately you can't use joinGet with something created on the fly (well, you could create a temporally join table but you need several SQL queries).

So there is a way to do that, using transform:

with (
  select (groupArray(key), groupArray(value)) from my_table
) as key_value
javisantana / summing_merge_tree.sql
Created Jan 17, 2020
SummingMergetree multiple keys in nested maps
View summing_merge_tree.sql
MacBook-Pro-de-javi.local :) create table multiple_keys (tmp Int32, testMap Nested (a Int32, bKey Int32, value Int32)) Engine=SummingMergeTree() order by (tmp);
CREATE TABLE multiple_keys
`tmp` Int32,
`testMap` Nested(
a Int32,
bKey Int32,
value Int32)
javisantana / compile clickhouse on
Last active Mar 11, 2020
compile clickhouse on mojave
View compile clickhouse on

clickhouse on osx

These are the steps I followed to compile clickhouse on OSX mojave (10.4.3). These might not be the best way to make it compile on OSX, my knowledge about C++ and build tooling is really limited but they do the work.

clickhouse sha-1: 2ad4df1d6a

compile command (apply the patch below before running cmake)

javisantana / res.sql
Last active Jun 20, 2019
filter ourside the join vs inside the join in clickhouse
View res.sql
MacBook-Pro-de-javi.local :) select cityHash64(groupArray(cityHash64(*))) from A a asof inner join (select * from B where ts<toDateTime('1970-01-01 02:00:00')) b on and a.ts=b.ts where a.ts<toDateTime('1970-01-01 02:00:00');
SELECT cityHash64(groupArray(cityHash64(*)))
WHERE ts < toDateTime('1970-01-01 02:00:00')
) AS b ON ( = AND (a.ts = b.ts)
View charla