Skip to content

Instantly share code, notes, and snippets.

@nikneroz
Created September 9, 2019 13:57
Show Gist options
  • Save nikneroz/43eaeec8c4a253cdbf858e1072673f45 to your computer and use it in GitHub Desktop.
Save nikneroz/43eaeec8c4a253cdbf858e1072673f45 to your computer and use it in GitHub Desktop.
Cassandra DB

What is Cassandra DB?

Cassandra is written in Java and was open-sourced by Facebook in July 2008. This original version of Cassandra was written primarily by an ex-employee from Amazon and one from Microsoft. It was strongly influenced by Dynamo, Amazon’s pioneering distributed key/value database. Cassandra implements a Dynamo-style replication model with no single point of failure, but adds a more powerful “column family” data model.

Cassandra has become so popular because of its outstanding technical features. It isdurable, seamlessly scalable, and tuneably consistent. It performs blazingly fast writes, can store hundreds of terabytes of data, and is decentralized and symmetrical so there’sno single point of failure. It is highly available and offers a schema-free data model

Where Can I use Cassandra DB?

  • Large-scale, high-volume websites, such as Web 2.0 social applications
  • High-performance, decentralized, elastic data stores
  • Fault-tolerance
  • Eventually consistent data store

RDBMS features.

  • Transactions(Atomic, Consistent, Isolated, Durable)
  • SQL
  • Solid data model

How can we speed up current RDBMS?

  1. Application layer. We try to improve our indexes. We optimize the queries. But presumably at this scale we weren’t wholly ignorant of index and query optimization, and already had them in pretty good shape. So this becomes a painful process of picking through the data access code to find any opportunities for fine tuning. This might include reducing or reorganizing joins, throwing out resource-intensive features such as XML processing within a stored procedure, and so forth.
  2. Caching layer. For larger systems, this might include distributed caches such as memcached, EHCache, Oracle Coherence, or other related products. Now we have a consistency problem between updates in the cache and updates in the database, which is exacerbated over a cluster.
  3. Denormalization. We turn our attention to the database again and decide that, now that the application is built and we understand the primary query paths, we can duplicate some of the data to make it look more like the queries that access it.
  4. Vertical scaling. We can solve the problem by adding more memory, adding faster processors, and upgrading disks.
  5. Gorizontal scaling. We have vertical scaling constraints, so we need to add more nodes to our DB cluster.

Now we have the problem of data replication and consistency during regular usage and in failover scenarios. So we need to update the configuration of the database management system. This might mean optimizing the channels the database uses to write to the underlying filesystem. We turn off logging or journaling, which frequently is not adesirable (or, depending on your situation, legal) option.

Commit data across multiple hosts.

2 Phase commit.

Phase 1 - Each server that needs to commit data writes its data records to the log. If a server is unsuccessful, it responds with a failure message. If successful, the server replies with an OK message.

Phase 2 - This phase begins after all participants respond OK. Then, the coordinator sends a signal to each server with commit instructions. After committing, each writes the commit as part of its log record for reference and sends the coordinator a message that its commit has been successfully implemented. If a server fails, the coordinator sends instructions to all servers to roll back the transaction. After the servers roll back, each sends feedback that this has been completed.

The main problem is that you need to lock resources on all nodes.

Sharding and shared-nothing architecture.

The idea here is that you split the data so that instead of hosting all of it on a single server or replicating all of the data on all of the servers in a cluster, you divide up portions of the data horizontally and host them each separately.

It seems clear that in order to shard, you need to find a good key by which to order your records. For example, you could divide your customer records across 26 machines,one for each letter of the alphabet, with each hosting only the records for customers whose last names start with that particular letter.

The main problem that it’s hard to split data equally.

Sharding approaches

  • Feature-based shard or functional segmentation: For example, at eBay, the users are in one shard, and the items for sale are in another.
  • Key-based sharding: In this approach, you find a key in your data that will evenly distribute it across shards. It is commonin this strategy to find time-based or numeric keys to hash on.
  • Lookup table: One of the nodes in the cluster acts as a “yellow pages” directory and looks up which node has the data you’re trying to access.

Sharding could be termed a kind of “shared-nothing” architecture that’s specific to databases. A shared-nothing architecture is one in which there is no centralized (shared)state, but each node in a distributed system is independent, so there is no client con-tention for shared resources.

The Cassandra database is a shared-nothing architecture, as it has no central controller and no notion of master/slave; all of its nodes are the same.

Partitioning

Cassandra does partition across nodes (because if you can’t split it you can’t scale it). All of the data for a Cassandra cluster is divided up onto “the ring” and each node on the ring is responsible for one or more key ranges. You have control over the Partitioner (e.g. Random, Ordered) and how many nodes on the ring a key/column should be replicated to based on your requirements.

Cassandra is distributed, which means that it is capable of running on multiple machines while appearing to users as a unified whole.
The fact that Cassandra is decentralized means that there is no single point of failure. All of the nodes in a Cassandra cluster function exactly the same.

Elastic scalability refers to a special property of horizontal scalability. It means that your cluster can seamlessly scale up and scale back down.

Cassandra is highly available and fault tolerant. You can replace failed nodes in the cluster with no downtime, and you can replicate data to multiple data centers to offer improved local performance and prevent downtime if one data center experiences a catastrophe such as fire or flood

Data consistency

Consistency essentially means that a read always returns the most recently written value.

  • Strict consistency - It requires that any read will always return the most recently written value.
  • Causal consistency - All processes in the system agree on the order of the causally related operations. If two different, unrelated operations suddenly write to the same field, then those writes are inferred not to becausally related. But if one write occurs after another, we might infer that they are causally related. Causal consistency dictates that causal writes must be read in sequence.
  • Weak (eventual) consistency Eventual consistency means on the surface that all updates will propagate through-out all of the replicas in a distributed system, but that this may take some time. Eventually, all replicas will be consistent.

The replication factor lets you decide how much you want to pay in performance to gain more consistency. You set the replication factor to the number of nodes in the cluster you want the updates to propagate to (remember that an update means anyadd, update, or delete operation)

The consistency level is a setting that clients must specify on every operation and that allows you to decide how many replicas in the cluster must acknowledge a write operation or respond to a read operation in order to be considered successful. That’s thepart where Cassandra has pushed the decision for determining consistency out to theclient.

Brewer’s CAP Theorem

Consistency - All database clients will read the same value for the same query, even given con-current updates.
Availability - All database clients will always be able to read and write data.
Partition Tolerance - The database can be split into multiple machines; it can continue functioning in the face of network segmentation breaks.

Brewer’s theorem is that in any given system, you can strongly support only two of the three. This is analogous to the saying you may have heard in software development:“You can have it good, you can have it fast, you can have it cheap: pick two.”

Cassandra main features

  • Row-Oriented - Cassandra is frequently referred to as a “column-oriented” database, which is not in-correct. It’s not relational, and it does represent its data structures in sparse multidimensional hash tables. “Sparse” means that for any given row you can have oneor more columns, but each row doesn’t need to have all the same columns as otherrows like it (as in a relational model). Each row has a unique key, which makes its dataaccessible. So although it’s not wrong to say that Cassandra is columnar or column-oriented, it might be more helpful to think of it as an indexed, row-oriented store
  • Schema-Free - The data tables are sparse, so you can just start adding data to it, using the columns that you want;
  • High Performance - Cassandra was designed specifically from the ground up to take full advantage of multiprocessor/multicore machines, and to run across many dozens of these machineshoused in multiple data centers. It scales consistently and seamlessly to hundreds ofterabytes.

Use cases

  • Large Deployments - high availability, tuneable consistency, peer-to-peer protocol, and seamless scaling, which are its main selling points.
  • Lots of Writes, Statistics, and Analysis - storing user activity updates, social network usage, recommendations/reviews, and application statistics.
  • Geographical Distribution - You can easily configure Cassandra to replicate data across multiple data centers.
  • Evolving Applications - If your application is evolving rapidly and you’re in “startup mode” Cassandra might be a good fit given its schema-free data model.

Relational Data Model

The database contains tables. Tables have names and contain one or more columns, which also have names. When we add data to a table, we specify a value for every column defined; if we don’t have a value for a particular column, we use null. This new entry adds a row to the table, which we can later read if we know the row’s unique identifier (primary key), or by using a SQL statement that expresses some criteria that row might meet. If we want to update values in the table, we can update all of the rows or just some of them, depending on the filter we use in a “where” clause of our SQL statement.

Cassandra Data Model

Explanation

Cassandra stores data in other way. So we need something that will group some of the column values together in a distinctly addressable group. We need a key to reference a group of columns that should be treated together as a set. We need rows. Then, if we get a single row, we can get all of the name/value pairs for a single entity at once, or just get the values for the names we’re interested in. We could call these name/value pairs columns. We could call each separate entity that holds some set of columns rows. And the unique identifier for each row could be called a row key.

Both row keys and column names can be strings, like relational column names, but they can also be long integers, UUIDs, or any kind of byte array. So there’s some variety to how your key names can be set.

Instead of storing null for those values we don’t know, which would waste space, we just won’t store that column at all for that row. So now we have a sparse, multidimensional array structure that looks like this.

These aren’t the Columns you’re looking for. . .

Columns are immutable in order to prevent multithreading issues.
Columns have the timestamp, which records the last time the column was updated. This is not an automatic metadata property, how-ever; clients have to provide the timestamp along with the value when they perform writes. You cannot query by the timestamp; it is used purely for conflict resolution on the server side.

Column families In the same way that a database is a container for tables, a keyspace is a container for a list of one or more column families. A column family is roughly analogous to a table in the relational model, and is a container for a collection of rows. Each row contains ordered columns. Column families represent the structure of your data. Each keyspace has at least one and often many column families.

Column family has two attributes: a name and a comparator. The comparator value indicates how columns will be sorted when they are returned to you in a query—according to long, byte, UTF8,or other ordering.

So while it’s not incorrect to call it column-oriented, or columnar, it might be easier to understand the model if you think of rows as containers for columns. This is also why some people refer to Cassandra column families as similar to a four-dimensional hash: [Keyspace][ColumnFamily][Key][Column]

Cluster is a ring.

Cassandra database is specifically designed to be distributed over several machines operating together that appear as a single instance to the end user. So the outermost structure in Cassandra is the cluster, sometimes called thering, because Cassandra assigns data to nodes in the cluster by arranging them in a ring.

A node holds a replica for different ranges of data. If the first node goes down, a replica can respond to queries. The peer-to-peer protocol allows the data to replicate across nodes in a manner transparent to the user, and the replication factor is the number of machines in your cluster that will receive copies of the same data

Replication factor

In simplest terms, the replication factor refers to the number of nodes that will act as copies (replicas) of each row of data. If your replication factor is 3, then threenodes in the ring will have copies of each row, and this replication is transparent to clients.
The replication factor essentially allows you to decide how much you want to payin performance to gain more consistency. That is, your consistency level for reading and writing data is based on the replication factor.

Replica placement strategy

The replica placement refers to how the replicas will be placed in the ring.
There are different strategies that ship with Cassandra for determining which nodes will get copies of which keys.

  • SimpleStrategy (formerly known asRackUnawareStrategy)
  • OldNetworkTopologyStrategy (formerly known as Rack-AwareStrategy), -
  • NetworkTopologyStrategy (formerly known as Datacenter-ShardStrategy).

Column Family Options

There are a few additional parameters that you can define for each column family.

  • keys_cached: The number of locations to keep cached per SSTable. This doesn’t refer to column name/values at all, but to the number of keys, as locations of rows per column family, to keep in memory in least-recently-used order.
  • rows_cached: The number of rows whose entire contents (the complete list of name/value pairsfor that unique row key) will be cached in memory.
  • read_repair_chance: This is a value between 0 and 1 that represents the probability that read repairoperations will be performed when a query is performed without a specified quo-rum, and it returns the same row from two or more replicas and at least one of thereplicas appears to be out of date. You may want to lower this value if you areperforming a much larger number of reads than writes.
  • preload_row_cache: Specifies whether you want to prepopulate the row cache on server startup.

Column Sorting

Columns have another aspect to their definition. In Cassandra, you specify how column names will be compared for sort order when results are returned to the client. Columns are sorted by the “Compare With” type defined on their enclosing column family, andyou can choose from the following:

  • AsciiType: This sorts by directly comparing the bytes, validating that the input can be parsedas US-ASCII. US-ASCII is a character encoding mechanism based on the lexicalorder of the English alphabet. It defines 128 characters, 94 of which are printable.
  • BytesType: This is the default, and sorts by directly comparing the bytes, skipping the valida-tion step. BytesType is the default for a reason: it provides the correct sorting formost types of data (UTF-8 and ASCII included).
  • LexicalUUIDType: A 16-byte (128-bit) Universally Unique Identifier (UUID), compared lexically (bybyte value).
  • LongType: This sorts by an 8-byte (64-bit) long numeric type.
  • IntegerType: Introduced in 0.7, this is faster than LongType and allows integers of both fewer andmore bits than the 64 bits provided by LongType.
  • TimeUUIDType: This sorts by a 16-byte (128-bit) timestamp. There are five common versions of generating timestamp UUIDs. The scheme Cassandra uses is a version one UUID, which means that it is generated based on conflating the computer’s MAC address and the number of 100-nanosecond intervals since the beginning of the Gregoriancalendar.
  • UTF8TypeA: string using UTF-8 as the character encoder. Although this may seem like a good default type, that’s probably because it’s comfortable to programmers who are used to using XML or other data exchange mechanism that requires common encoding. In Cassandra, however, you should use UTF8Type only if you want your data validated.
  • Custom: You can create your own column sorting mechanism if you like. This, like many things in Cassandra, is pluggable. All you have to do is extend the org.apache.cassandra.db.marshal.AbstractType and specify your class name

Super Columns

Each column family is stored on disk in its own separate file. So to optimize performance, it’s important to keep columns that you are likely to query together in the same column family, and a super column can be helpful for this.

Here we see some more of the richness of the data model. When using regular columns,as we saw earlier, Cassandra looks like a four-dimensional hashtable. But for supercolumns, it becomes more like a five-dimensional hash: [Keyspace][ColumnFamily][Key][SuperColumn][SubColumn]

Composite Keys

There is an important consideration when modeling with super columns: Cassandra does not index subcolumns, so when you load a super column into memory, all of its columns are loaded as well. This limitation was discovered by Ryan King, the Cassandra lead at Twitter. It might be fixed in a future release, but the change is pending an update to the underlying storage file (the SSTable).

You can use a composite key of your own design to help you with queries. A composite key might be something like <userid:lastupdate>. This could just be something that you consider when modeling, and then check back on later when you come to a hardware sizing exercise. But if your data model anticipates more than several thousand subcolumns, you might want to take a different approach and not use super columns.

The alternative involves creating a composite key. Instead of representing columns within a super column, the composite key approach means that you use a regular column family with regular columns, and then employ a custom delimiter in your key name and parse it on client retrieval.

Keys

  • The Partition Key is responsible for data distribution across your nodes.
  • The Clustering Key is responsible for data sorting within the partition.
  • The Primary Key is equivalent to the Partition Key in a single-field-key table (i.e. Simple).
  • The Composite/Compound Key is just any multiple-column key

CQL

https://docs.datastax.com/en/archived/cql/3.3/cql/cql_using/useAboutCQL.html
https://docs.datastax.com/en/archived/cql/3.3/cql/cql_using/useCompositePartitionKeyConcept.html

Denormalization

https://www.datastax.com/dev/blog/basic-rules-of-cassandra-data-modeling

Materialized View

https://www.datastax.com/dev/blog/new-in-cassandra-3-0-materialized-views

Data modeling

https://tech.ebayinc.com/engineering/cassandra-data-modeling-best-practices-part-1/
https://tech.ebayinc.com/engineering/cassandra-data-modeling-best-practices-part-2/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment