Kubernetes generateName randomly generates a 5 char sequence that is appended as a "hash" to the generateName prefix (spoiler: It's not really a hash, just an encoded random number).
Each char in the sequence is from the "bcdfghjklmnpqrstvwxz2456789" range.
This leads to
But there is a problem with collisions. There is a 50% chance of hitting a collision before generating even 5000 names, and a 0.1% chance at only 500 names! This is unacceptably BAD. For comparison a 160 bit git hash has a 50% probability of a single collision at about
Today a HTTP 409 respose returned to the client if a collision happens. It is possible for clients to detect the error and retry, but we've seen that most clients don't realize to do this and hit this problem in production.
Modify the apiserver to retry name generation. After generating a name, the apiserver checks if the name already exists in etcd, and retries name generation if needed.
If we retry up to 7 times, we can generate up to 1 million names before reaching a 0.1% chance of a collision. This is roughtly the same probability of collision we would get if we were to increased the number of chars per hash to 11.
This doesn't fundamentally fix the problem. If the last retry results in a collision, the collision will still be reported back to the client, but it reduces the severity of the problem dramatically, buying time for longer term solutions to be developed.
See the prototype for implementation details.
Note that the probability of a collision follows normal hash collision probabiblities:
The probability changes dramatically if we retry when a generated name conflicts with an existing name. We can calculate the probability when retries are performed iteratively with:
Where
Graphing this out for
And the more retry attempts we allow, the lower the probability of collisions:
After 7 attempts, the probability of a collision for 1 million entries drops below 0.1%.
For comparison, the probability of a collision if we were instead to increase the number of chars are:
number of chars | # of possible names | 50% probability of collision when # of names added | 0.1% probability of collision when # of names added |
---|---|---|---|
5 | 14348907 | 5000 | 500 |
6 | 387420489 | 25000 | 900 |
7 | 10460353203 | 120000 | 5000 |
8 | 282429536481 | 650000 | 8000 |
9 | 7625597484987 | 3300000 | 120000 |
10 | 205891132094649 | 17000000 | 600000 |
11 | 5559060566555523 | 24000000 | 3400000 |
(based on https://kevingal.com/apps/collision.html)
Here, we can see that we need 11 chars to for the "0.1% probability of collision when # of names added" to exceed 1 million. The gives us a way to compare approaches. This is how we can conclude that 7 retries provides roughly the same probability of collision that we would get if we were to increased the number of chars per hash to 11.
I compared the equations used in this doc with actual conflicts on apiserver where the hash size was set to 2 and a range of retries were used, the resulting in:
retries | # of hashes at first collision (observed) | # of hashes at 50% probabiliy of collision (calculated) |
---|---|---|
0 | 55 | 33 |
1 | 158 | 104 |
2 | 117 | 54 |
3 | 202 | 182 |
4 | 368 | 251 |
5 | 346 | 309 |
6 | 444 | 357 |
7 | 365 | 397 |
8 | 450 | 430 |
Thank you @siyuanfoundation for help on the math!