Expanding on my thread, it’s been nagging me that the random tail on small key sortable unique IDs (KSUID) might not provide as much protection vs collision as I’d trusted. Segment were able to throw bits at the problem, but I needed to pack my identifiers in smaller spaces.
After sparing 32 bits for the timestamp I might have room for only 32, 40, or 96 random bits. There’s that cryptographers’ rule of thumb about collision needing half the bit count… but how likely is it we’ll get at least one collision?
My math being a bit rusty, I decided to brute force it (birthday.exs
). I didn't like my results.exs
:
- 39% failure for 2¹⁶ random choices from 2³² possible keys (3443 rounds)
- 45% failure for 2²⁰ random choices from 2⁴⁰ possible keys (163 rounds)
Searching around again, worried, I found “Understanding the Birthday Paradox” and played with its calculator. Clearly, I'd been cocky: I'd expected I'd need to retry a little at my “peak throughput” of half the bits, but I'd actually need to retry around half the time.
To nail down my safety margin once and for all, I've adapted Appendix A:
odds_of_missing_collision = fn population_bits, random_identifier_bits ->
population = :math.pow(2, population_bits)
space = :math.pow(2, random_identifier_bits)
1 |> :math.exp() |> :math.pow(population * (population - 1) / (-2 * space))
end
for sacrificed_bits <- 0..8 do
confidence = odds_of_missing_collision.(16 - sacrificed_bits, 32)
rounds_for_half_chance_of_collision = floor(:math.log(0.5) / :math.log(confidence))
hours_at_one_round_per_second = Float.round(rounds_for_half_chance_of_collision / 3600, 1)
{sacrificed_bits, confidence, rounds_for_half_chance_of_collision, hours_at_one_round_per_second}
end
→
[
{0, 0.6065352871919841, 1, 0.0},
{1, 0.8825002690495376, 5, 0.0},
{2, 0.9692350831437918, 22, 0.0},
{3, 0.9922188845134587, 88, 0.0},
{4, 0.9980492570143334, 354, 0.1},
{5, 0.9995120762421099, 1420, 0.4},
{6, 0.999878056332523, 5683, 1.6},
{7, 0.9999695424903592, 22757, 6.3},
{8, 0.9999924004366679, 91208, 25.3}
]
Sacrificing another eight bits gives you a peak traffic level for which you've got half a chance of a collision every ninety thousand peaks.
If you squeeze a Segment-style KSUID into 64 bits, you'll have 32 bits for the timestamp in seconds and another 32 bits for the random component. If your peak traffic is 256 attempts per second, it's a coin-flip whether you survive peaking for a day.