Skip to content

Instantly share code, notes, and snippets.

@otrack
Last active September 1, 2023 16:28
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save otrack/faf95003730e16a1ad60d6daa89d70e9 to your computer and use it in GitHub Desktop.
Save otrack/faf95003730e16a1ad60d6daa89d70e9 to your computer and use it in GitHub Desktop.

Assessing the Performance of Cassandra 5 at Scale

Context

Apache Cassandra is one of the most prominent modern storage systems. It offers a complex data model and a rich API (Cassandra Query Language) to write distributed applications. This storage system is able to store petabytes of data and is used in many industry-leading companies as a key building block in the application stack [a]. In particular, Cassandra is extremely robust and able to replicate data consistently across several geo-graphical locations despite network failures, or even an entire datacenter outage.

A new version of Apache Cassandra (version 5) has been announced recently [b]. It provides a full support of ACID transactions, similarly to traditional distributed databases such as Postgres. To execute a transaction, Cassandra 5 relies on a new consensus protocol called Accord [c,d]. Accord leverages the recent advances in leaderless state-machine replication to execute transactions quickly among a set of geo-distributed sites.

Objectives

The goal of this research project is to evaluate the performance of Apache Cassandra 5. It relies on standard benchmarks such as the Yahoo! Cloud Serving Benchmark (YCSB) and the TPC benchmarks suite. The evaluation is run at scale, that is in a geo-distributed setting, atop a public cloud infrastructure (such as GCP or AWS), with Cassandra deployed over several continents. We aim to answer the following research questions:

  • Is Accord more efficient than Paxos (the consensus protocol used in Cassandra 4)?
  • What is the behavior of Accord in the event of a failure of one or multiple data partitions?
  • How efficient are the mechanisms to bound metadata usage (e.g., garbage collection)?

To apply

We are looking for 1-2 students to run the experiments. This work is done in close cooperation with Apple Inc., who is leading the development of Accord, and the IMDEA Software Institute.

Please contact Prof. Pierre Sutra for further details.


[a] CloudKit: Structured Storage for Mobile Applications, A. Shraer et al., VLDB '18.
[b] https://www.cassandrasummit.org/cassandra-forward
[c] https://github.com/apache/cassandra-accord
[d] https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-15%3A+General+Purpose+Transactions?preview=/188744725/188744736/Accord.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment