ClickHouse | Druid or Pinot |
---|---|
The organization has expertise in C++ | The organization has expertise in Java |
Small cluster | Large cluster |
A few tables | Many tables |
Single data set | Multiple unrelated data sets (multitenancy) |
Tables and data sets reside the cluster permanently | Tables and data sets periodically emerge and retire from the cluster |
Table sizes (and query intensity to them) are stable in time | Tables significantly grow and shrink in time |
Homogeneity of queries (their type, size, distribution by time of the day, etc.) | Heterogeneity |
There is a dimension in the data, by which it could be partitioned and almost no queries that touch data across the partitions are done (i. e. shared-nothing partitioning) |
There is no such dimension, queries often touch data across the whole cluster. Edit 2019: Pinot now supports partitioning by dimension. |
Cloud is not used, cluster is deployed on specific physical servers | Cluster is deployed in the cloud |
No existing clusters of Hadoop or Spark | Clusters of either Hadoop or Spark already exist and could be used |
-
-
Save leventov/8943669bb1c7deabb82d7e1610bbf52f to your computer and use it in GitHub Desktop.
There is no such dimension, queries often touch data across the whole cluster
. Pinot has support for partitioning and sorting on a single dimension key.
@kishoreg thanks, updated. However, partitioning by key makes partition-based sampling problematic (because it may be very biased). And efficient sampling may be even more important that the benefits that key-based partitioning provides.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This table is used in this post: https://medium.com/@leventov/comparison-of-the-open-source-olap-systems-for-big-data-clickhouse-druid-and-pinot-8e042a5ed1c7