Skip to content

Instantly share code, notes, and snippets.

@alicarbonteq
Last active September 22, 2023 01:57
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save alicarbonteq/e889e1358672914ba25a291ed4637c5f to your computer and use it in GitHub Desktop.
Save alicarbonteq/e889e1358672914ba25a291ed4637c5f to your computer and use it in GitHub Desktop.
Conflict Resolution (Vector Clock in Distributed System)

Conflict Resolution (Vector Clock in Distributed System)

Definition

Vector Clocks are used in a distributed systems to determine whether pairs of events are causally related. Using Vector Clocks, timestamps are generated for each event in the system, and their causal relationship is determined by comparing those timestamps.

Vector clock is use to control the ordering of multi-version records.

  • Use in key-value store like riak, Dynamo
  • Enables causality to be captured
  • With vector clocks, we assume we know about each participating process
    • Suppose there are N processes in the group [1…N]
    • Each vector has N elements
  • Instead of sending a single timestamp, send a vector of timestamps for each process (Update pairwise)

How does the vector clock algorithm work :

  • Initially, all the clocks are set to zero.
  • Each time a process experiences an internal event, it increments its own logical clock in the vector by one.
  • Each time for a process to send a message, it must increment its own clock and then send a copy of its own vector.
  • Each time a process receives a message,
    • It increments its own logical clock in the vector by one
    • Updates each element in its vector by taking the maximum value in its own vector clock

Explanation

Let's think about versioning, ie multiple people editing the same document over time. Your responsibility is to keep the editing history consistent. So for example, if person A edits the doc, and then B edit's A's version, we can say that B's version of the doc is the most up to date since it contains all of A's edits. But if A and B grab the document simultaneously, and then A submits his edit, then B submits his, you'd like to be able to tell that their versions are conflicts (also known as siblings). You would not want to automatically call B's the most up to date, since that would wipe out A's changes, even though B's edit was submitted after A's.

How do we detect conflicts? One method of doing this is to keep parent/child relationships over time. Suppose we start with a document at version 1, and then A makes an edit and submits a new version 1a that contains a reference to 1 as its parent. Then B edits and submits a new version 1b that contains a reference to 1 as its parent. We can tell that 1a and 1b are siblings by walking up the tree and finding a common parent.

Can we do better? Vector clocks approach the problem by keeping around version history as a function of editors rather than documents. So an initial document might start with version { A: 0, B: 0 }, meaning that person A has made no edits, and person B has made no edits. Then, when a person edits a document, they take the vector clock and increment just the counter for their own entry.

For example, A grabs the doc with version { A: 0, B: 0 }, edits and submits a new document with version { A: 1, B: 0}. B grabs { A: 0, B: 0}, edits and submits a new doc with version { A: 0, B: 1 }.

You can tell that a document X is a direct ancestor (ie, not a sibling) of document Y if the counters for each participant in the vector clock of X is greater than or equal to the vector clock of Y. The vector clock is explicitly tracking the history of people who have edited the document.

You can tell that a document X is a sibling of Y if there exist any participant in X's vector clock who has a counter that is less than his corresponding counter in Y.

The space requirement under this scheme is a function of the number of participants, not the length of the history. That's why you might adopt this strategy over the more obvious one.

Comparing Vectors

Consider two vectors, X and Y:

  • If each element of X is <= Y: X causally precedes Y
  • If each timestamp in X is >= Y: Y causally precedes X
  • Else: X and Y are concurrent

Disadvantages of vector clock

  • The main disadvantage of vector clock is that they are not being constant in size.
  • The implementation of vector clock need an entry for each site
@wenhoujx
Copy link

this page is helpful, but the following statement is incorrect:

You can tell that a document X is a direct ancestor (ie, not a sibling) of document Y if the counters for each participant in the vector clock of X is greater than or equal to the vector clock of Y.

X is an ancestor of Y if every element of X is <= that of Y

You can tell that a document X is a sibling of Y if there exist any participant in X's vector clock who has a counter that is less than his corresponding counter in Y.

X is sibling of Y, if there exists an element such that X>Y and an element such that X<Y.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment