Skip to content

Instantly share code, notes, and snippets.

@subintp
Last active March 19, 2020 13:28
Show Gist options
  • Save subintp/66b0455a2b1019f031a6857b71e4e138 to your computer and use it in GitHub Desktop.
Save subintp/66b0455a2b1019f031a6857b71e4e138 to your computer and use it in GitHub Desktop.
Cassandra data modeling principles
Design Principles
1. Sacrifice space for time
2. Single partition per query
Cassandra Data Modelling Process
1. Conceptual Modelling
Identify Entities, attributes and their relationships
Eg: User one -> many Campaigns (relationship)
User => name: string, email: string, location: string (attributes)
2. Define access patterns
Find all possible queries with which data can be accessed.
eg:
Q1. Find a user with specific email
Q2. Find most recently added campaigns
3. Logical modelling
Logical model is the data model which support all the access patterns(queries)
We need to do the following to create Logical model
1. We might have create new mapping tables to support queries
2. Identify the primary key for the table, adding partition key columns based on the required query attributes, and clustering columns in order to guarantee uniqueness and support desired sort ordering.
eg: Inorder to find the users by the location we need to create a new table users_by_location.
Attributes are listed below.
user_id: (K)
location: C^
K = Partition key
C = Clustering key
4. Physical Model
Physical Model is created by adding CQL datatypes to the attributes
Anti patterns
1. Full cluster scan.
2. Full table scan.
3. IN queries.
4. Queries that need a read before write.
5. Excess use of secondary indexes.
Reference:
http://cassandra.apache.org/doc/latest/data_modeling/index.html
https://academy.datastax.com/resources/ds220-data-modeling
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment