ClickHouse | Druid or Pinot |
---|---|
The organization has expertise in C++ | The organization has expertise in Java |
Small cluster | Large cluster |
A few tables | Many tables |
Single data set | Multiple unrelated data sets (multitenancy) |
Tables and data sets reside the cluster permanently | Tables and data sets periodically emerge and retire from the cluster |
Table sizes (and query intensity to them) are stable in time | Tables significantly grow and shrink in time |
Homogeneity of queries (their type, size, distribution by time of the day, etc.) | Heterogeneity |
There is a dimension in the data, by which it could be partitioned and almost no queries that touch data across the partitions are done (i. e. shared-nothing partitioning) |
There is no such dimension, queries often touch data across the whole cluster. Edit 2019: Pinot now supports partitioning by dimension. |
Cloud is not used, cluster is deployed on specific physical servers | Cluster is deployed in the cloud |
No existing clusters of Hadoop or Spark | Clusters of either Hadoop or Spark already exist and could be used |
-
-
Save leventov/8943669bb1c7deabb82d7e1610bbf52f to your computer and use it in GitHub Desktop.
@kishoreg thanks, updated. However, partitioning by key makes partition-based sampling problematic (because it may be very biased). And efficient sampling may be even more important that the benefits that key-based partitioning provides.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
There is no such dimension, queries often touch data across the whole cluster
. Pinot has support for partitioning and sorting on a single dimension key.