most distributed uid generation schemes are inspired by twitter's snowflake.
basic principle is that we generate ids that can be split into 3 chunks, TSM
where:
T
= time differenceS
= server 'id'M
= server counter, in case we get the same T
use some global registry (probably the most practical solution), and race with another server to register.
alternative:
- join cluster
- N = cluster size (can probably just query neighbour)
- get consensus that you can register as server N + 1.
problem: ideally we want a global order on the ids.
this is ok, TSM
is good enough.
stronger:
if I get id x1
before another id x2
, I want x1 < x2
.
this may not always hold, but most of the time it's fine.
most applications dont need this anyways.
if we relax constraint to be only true for a client,
i.e. if client 1 gets x1
and makes request to get x2
,
then x2 > x1
.
then it is simpler to solve,
client need just to attach maximal id it has seen on requests for new ids.
server just need to guarantee that id it returns is larger than seen id.
this may help with clock drift too.
generate(x)
while True:
x' = generate_id()
(T, S, M) = x
(T', S', M') = x'
if x' > x:
return x'
if T' < T:
correct drift
wait until T' can be > T
difficulty comes from the fact that we cannot just return T | S | M+1
because it may violate the core guarantee (universally unique).
pick large enough bits for T
=> small enough dt
then waiting
for e.g. 10ms is probably not a big deal.
generate()
x = generate_id()
forward x to all servers with id < own id
return x
upon receipt of x
by server i
, i
uses the algorithm to generate
a strictly larger id as above.
this needs i
to remember the largest id it has seen.
note that solution is probably not very practical;
in the time taken for s
to broadcast x
,
it is likely that T
is no longer valid.
best effort => broadcast using UDP.