Skip to content

Instantly share code, notes, and snippets.

@peijiehu
Created July 16, 2017 23:39
Show Gist options
  • Save peijiehu/190307a6d4bee5a5bae1624d8ffcc74a to your computer and use it in GitHub Desktop.
Save peijiehu/190307a6d4bee5a5bae1624d8ffcc74a to your computer and use it in GitHub Desktop.
columnar db - definition and use cases

How do columnar databases work? The defining concept of a column-store is that the values of a table are stored contiguously by column. Thus the classic supplier table from CJ Date's supplier and parts database:

SNO STATUS CITY SNAME


S1 20 London Smith S2 10 Paris Jones S3 30 Paris Blake S4 20 London Clark S5 30 Athens Adams would be stored on disk or in memory something like:

S1S2S3S4S5;2010302030;LondonParisParisLondonAthens;SmithJonesBlakeClarkAdams This is in contrast to a traditional rowstore which would store the data more like this:

S120LondonSmith;S210ParisJones;S330ParisBlake;S420LondonClark;S530AthensAdams From this simple concept flows all of the fundamental differences in performance, for better or worse, between a column-store and a row-store. For example, a column store will excel at doing aggregations like totals and averages, but inserting a single row can be expensive, while the inverse holds true for row-stores. This should be apparent from the above diagram.

How do they differ from relational databases? A relation database is a logical concept. A columnar database, or column-store, is a physical concept. Thus the two terms are not comparable in any meaningful way. Column- oriented DMBSs may be relational or not, just as row-oriented DBMS's may adhere more or less to relational principles.

Columnar databases boost performance by reducing the amount of data that needs to be read from disk, both by efficiently compressing the similar columnar data and by reading only the data necessary to answer the query.

In practice, columnar databases are well-suited for OLAP-like workloads (e.g., data warehouses) which typically involve highly complex queries over all data (possibly petabytes). However, some work must be done to write data into a columnar database. Transactions (INSERTs) must be separated into columns and compressed as they are stored, making it less suited for OLTP workloads. Row-oriented databases are well-suited for OLTP-like workloads which are more heavily loaded with interactive transactions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment