leventov/clickhouse_vs_druid_or_pinot.md Secret

## clickhouse_vs_druid_or_pinot.md

      
    Raw
  

              clickhouse_vs_druid_or_pinot.md
            
          
    ClickHouse
    Druid or Pinot
  
  
    The organization has expertise in C++
    The organization has expertise in Java
  
  
    Small cluster
    Large cluster
  
  
    A few tables
    Many tables
  
  
    Single data set
    Multiple unrelated data sets (multitenancy)
  
  
    Tables and data sets reside the cluster permanently
    Tables and data sets periodically emerge and retire from the cluster
  
  
    Table sizes (and query intensity to them) are stable in time
    Tables significantly grow and shrink in time
  
  
    Homogeneity of queries (their type, size, distribution by time of the day, etc.)
    Heterogeneity
  
  
    There is a dimension in the data,
by which it could be partitioned

        and almost no queries that touch data
across the partitions are done
(i. e. shared-nothing partitioning)
    There is no such dimension,
queries often touch data across the whole cluster. Edit 2019: Pinot now supports partitioning by dimension.
  
  
    Cloud is not used, cluster is deployed on specific physical servers
    Cluster is deployed in the cloud
  
  
    No existing clusters of Hadoop or Spark
    Clusters of either Hadoop or Spark already exist and could be used
ClickHouse	Druid or Pinot
The organization has expertise in C++	The organization has expertise in Java
Small cluster	Large cluster
A few tables	Many tables
Single data set	Multiple unrelated data sets (multitenancy)
Tables and data sets reside the cluster permanently	Tables and data sets periodically emerge and retire from the cluster
Table sizes (and query intensity to them) are stable in time	Tables significantly grow and shrink in time
Homogeneity of queries (their type, size, distribution by time of the day, etc.)	Heterogeneity
There is a dimension in the data, by which it could be partitioned and almost no queries that touch data across the partitions are done (i. e. shared-nothing partitioning)	There is no such dimension, queries often touch data across the whole cluster. Edit 2019: Pinot now supports partitioning by dimension.
Cloud is not used, cluster is deployed on specific physical servers	Cluster is deployed in the cloud
No existing clusters of Hadoop or Spark	Clusters of either Hadoop or Spark already exist and could be used