Replication is a fundamental building block in modern computing infrastructures. This technique consists in de-duplicating data over several distributed machines, or replicas. It provides high availability, as data survives the failure of one or more replicas, and locality, clients having the possibility to access a nearby replica.
Unfortunately, the existing algorithms to coordinate access to replicated data are infamously difficult. Both failures and asynchrony make the interleaving of the processes unpredictable, complex, and often exponential in their numbers. In a recent work, it was established that Egalitarian Paxos (EPaxos) [1], a recent protocol for data replication which was published at the prestigious Symposium on Operating Systems (SOSP'13), is flawed [2]. The problem comes from the use of a single variable to track the decisions taken in the algorithm. As a result, processes may behave incorrectly, when concurrent accesses over replicated data occur in asynchrony periods, and break the guarantees offered by the protocol.
The goal of this research project is to come-up with a provably-correct fix for the algorithm.
For starters, the student(s) will study the EPaxos algorithm in depth. This study will rely on the pseudo-code provided by the authors [1], an implementation in Go [3], as well as recent works [4] that deconstruct EPaxos into simpler building blocks. Then, a fix will be proposed to avoid the incorrect behaviors. The soundness of this fix will be established with the help of machine-guided tools (e.g., theorem proving or symbolic model checking).
The project is open to students from either IP Paris Masters (PDS, HPDA), or VAP ASR at Telecom SudParis.
Prof. Pierre Sutra
[1] I. Moraru, D. G. Andersen, and M. Kaminsky. There Is More Consensus in Egalitarian Parliaments. In Symposium on Operating Systems Principles (SOSP), 2013.
[2] P. Sutra. On the correctness of Egalitarian Paxos. Information. Processing Letters, 2020.
[3] https://github.com/efficient/epaxos
[4] T. França Rezende, P. Sutra, Leaderless State-Machine Replication: Specification, Properties, Limits, In International Symposium on Distributed Computing (DISC), 2020.