Skip to content

Instantly share code, notes, and snippets.

@HariSekhon
Last active July 9, 2024 21:31
Show Gist options
  • Save HariSekhon/ac4b32b392c0b431e92e201b4ad774cc to your computer and use it in GitHub Desktop.
Save HariSekhon/ac4b32b392c0b431e92e201b4ad774cc to your computer and use it in GitHub Desktop.
vertica.md from HariSekhon/Knowledge-Base repo: https://github.com/HariSekhon/Knowledge-Base

HP Vertica

Proprietary MPP SQL OLAP database with a free community edition.

  • OLAP - online analytic processing (batch analytics, not real time)
  • columnar database
    • high compression because columns of same data type compress better
    • enhanced query performance for well targeted sequential queries due to column read skipping at the expense of single record retrieval
  • MPP - massively parallel processing architecture allows to scale horizontally across independent servers
    • segmentation into sub-clusters - on any integer column / expression - shards to nodes
      • local segmentation within a node to allow it to migrate that segment range of rows to another node
    • partitioning
  • data replication and server recovery
    • hybrid storage
      • Write-optimized store (WOS) - in-memory, unsorted, uncompressed - mover to write to the ROS below
      • Read-optimized store (ROS) - 1 data file on disk for each column
      • projections storing data in different ways for optimization
  • SQL + many analytics support
    • including windowing, pattern matching, time series
    • ML algorithms included eg. linear regression, logistic regression, k-means clustering, Naive Bayes classification, random forest decision trees, XGBoost, and support vector machine regression and classification
  • Resource Pool Parameters for query optimization:
    • memory size
    • planned concurrency
    • max concurrency
    • execution parallelism
    • uses the above to allocate a query budget
    • create different pools with different query budgets for different query profiles
  • integrates with Apache Spark, HDFS, and Kafka for streaming data ingestion
  • Docker support:
  • Kubernetes support: (version 10.1.1 onwards)
  • also integrates with Grafana, Helm, Go, and Distributed R
  • Python library: https://github.com/vertica/vertica-python
  • Golang library: https://github.com/vertica/vertica-sql-go

Queries

SELECT
  transaction_id, path_line
FROM
  v_monitor.query_plan_profiles
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment