Skip to content

Instantly share code, notes, and snippets.

@abachman
Created February 7, 2017 19:05
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save abachman/5e639973e239c736c2631a18c0468c20 to your computer and use it in GitHub Desktop.
Save abachman/5e639973e239c736c2631a18c0468c20 to your computer and use it in GitHub Desktop.
Cassandra data modeling notes

Based on http://datascale.io/cassandra-partitioning-and-clustering-keys-explained/

Primary Keys

A single column Primary Key is also called a Partition Key.

When Cassandra is deciding where in the cluster to store this particular piece of data, it will hash the partition key. The value of that hash dictates where the data will reside and which replicas will be responsible for it.

An example might look like this:

CREATE TABLE IF NOT EXISTS iotest.data_by_timeuuid (
  id timeuuid,
  uid int,
  fid int,
  val text,
  PRIMARY KEY (id)
);

Compound Keys

A multi-column primary key is called a Compound Key.

An interesting characteristic of Compound Keys is that only the first column is considered the Partition Key. There rest of the columns in the Primary Key clause are Clustering Keys.

Clustering Keys

Each additional column that is added to the Primary Key clause is called a Clustering Key. A clustering key is responsible for sorting data within the partition. By default, the clustering key columns are sorted in ascending order.

In this version, id is still the Partition Key, fid is a clustering key:

CREATE TABLE IF NOT EXISTS iotest.data_by_timeuuid (
  id timeuuid,
  uid int,
  fid int,
  val text,
  PRIMARY KEY (id, fid)
);

Composite Key

A Composite Key is when you have a multi-column Partition Key.

This is what we're using on IO right now. (uid, fid) is the Partition Key, id is a clustering key. Together uid and fid make up a Composite Partition [Primary] Key.

CREATE TABLE IF NOT EXISTS iotest.data_by_timeuuid (
  id timeuuid,
  uid int,
  fid int,
  val text,
  PRIMARY KEY ((uid, fid), id)
) WITH CLUSTERING ORDER BY (id DESC);

We could probably get away with dropping uid from the Partition Key and just go with ((fid), id) since the querying process will already have a valid feed id (fid). User id is superfluous.

Summary

  • Primary Keys, also known as Partition Keys, are for locating your data to a partition in the cluster.
  • Composite Keys are complex Partition Keys and are for including more columns in the calculation of the partition.
  • Compound Keys are for including other columns in the filter but not affecting the partition.
  • Clustering Keys are for sorting your data on the partition.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment