Skip to content

Instantly share code, notes, and snippets.

@ichaos
Last active December 17, 2015 19:29
Show Gist options
  • Save ichaos/5660352 to your computer and use it in GitHub Desktop.
Save ichaos/5660352 to your computer and use it in GitHub Desktop.
Distributed system
Quorum system (最低法定人数系统)
vector clock(向量时钟)
Vector clock is an algorithm for generating partial order of events (data object) in distributed system and detecting causality violation. In a distributed system with N clients, each data object that will be accessed by multiple clients should has a distinct vector clock which is an array of M logical clocks (M equals the number of clients that will access the object), the logical clock usually just a int value representing the time of corresponding client modify the data. This algorithm has the following rules:
1.Initially all clocks are zero.
2.Each time a client experiences an internal event, it increments its own logical clock in the vector by one.
3.Each time a client prepares to send a message, it increments its own logical clock in the vector by one and then sends its entire vector along with the message being sent.
4.Each time a client receives a message, it increments its own logical clock in the vector by one and updates each element in its vector by taking the maximum of the value in its own vector clock and the value in the vector in the received message (for every element).
C1: (0,0,0) ---- (1,0,0) -------------------------
C2: (0,0,0) ---------(1,1,0) ----- (1,2,0) -------
C3: (0,0,0) --------------------------(1,2,1) ----
The above example show how does VC works in a 3 clients system. There're two events altogether: one is C2 read data from C1, the VC(C1) change from (0,0,0) to (1,1,0), another is C3 read data from C2, then VC(C2) change from (1,1,0) to (1,2,0) and VC(C3) change from (0,0,0) to (1,2,1). At last, if someone read this object from these three clients it can detect C3 has the lastest data and contains all data writen by C1 and C2 because (1,2,1) > (1,2,0) and (1,0,0). However, sometimes there may be a conflict that two data object copys from two client have two clock that non of them is newer then other. That's a conflict and should be sloved.
In distributed storage systems like dynamo, one data object will have many replicas for fault-tolerance and improving reliability. These systems will use vector clock to help to synchronize those replicas, which will result in a eventual consistency.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment