Skip to content

Instantly share code, notes, and snippets.

@yangl
Created March 11, 2021 13:58
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save yangl/f55f6120ed4e2082e21c7e7e8b7f7624 to your computer and use it in GitHub Desktop.
Save yangl/f55f6120ed4e2082e21c7e7e8b7f7624 to your computer and use it in GitHub Desktop.
TLA+ specification for the BookKeeper replication protocol

In terms of tools, thinking processes, setup... you really need to start by learning about TLA+. I couldn't explain how it changes your engineering mindset/workflow better than Leslie Lamport himself. So if you're interested in this kind of thing then I highly recommend watching the TLA+ video course by Leslie Lamport: https://www.youtube.com/channel/UCajiu4Cj_GHOX0if3Up-eRA. I personally had to watch it more than once to understand some points.

Leslie Lamport also wrote a book which has been helpful to me: https://lamport.azurewebsites.net/tla/book.html. The book will require a fair amount of study to understand it all. For me, the liveness stuff came later, after I'd already written a couple of safety oriented specs.

Next up there are some examples you can learn from here: https://github.com/tlaplus/CommunityModules

When you start writing your first spec you will hit brick walls along the way. I highly recommend post your questions here: https://groups.google.com/g/tlaplus. The community is very responsive and helpful for all levels.

Regular engineers can pick up enough TLA+ in 2-3 weeks to write a useful spec, you don't need a PhD in mathematics. Once you can write something simple, then find something real to model. If you don't have a specific thing you want to write a spec for then one thing you can do is take a distributed system that you know and look for a protocol bug that has since been fixed. Try to implement the most minimum spec that recreates the bug. An example of this I did was the Apache Kafka KIP 101 which was a data loss protocol bug. https://cwiki.apache.org/confluence/display/KAFKA/KIP-101+-+Alter+Replication+Protocol+to+use+Leader+Epoch+rather+than+High+Watermark+for+Truncation

Once you can put together a basic spec, find something that you want to verify, whether it exists already or it is a design proposal. I recently modelled Pravegas auto-scaling topics which is not too complex and makes for a good first spec (as well as being related to BookKeeper).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment