ezdiy/birthday.md Secret

## birthday.md

      
    Raw
  

              birthday.md
            
          
    Estimate population size from number of unique collisions in a generalized birthday problem
We have 100 new people entering the room each year.
We need to figure out number of unique day+month dates (across all years) during
which any birthday (shared or not) has occured.
import random
D = 365
N = 100
def count(x):
    days = [0] * D
    for i in range(0,N*x):
        days[random.randint(0,D-1)] += 1
    return D - days.count(0)

for y in range(1,10):
    n = count(y)
    print("y=%d years, n=%d (%.2f%%) birthdays occured" %( y, n, n*100/D))
Output of which is:
y=1 years, n=86 (23.56%) birthdays occured
y=2 years, n=155 (42.47%) birthdays occured
y=3 years, n=208 (56.99%) birthdays occured
y=4 years, n=248 (67.95%) birthdays occured
y=5 years, n=271 (74.25%) birthdays occured
y=6 years, n=302 (82.74%) birthdays occured
y=7 years, n=314 (86.03%) birthdays occured
y=8 years, n=321 (87.95%) birthdays occured
y=9 years, n=338 (92.60%) birthdays occured
Is there a closed formula for n=count(y) function, given n, and
y, D and N? Further, is there an inverse of it - given a particular n,
return number of years - y=count_invert(n).
A formula exists for the case of counting all collisions occuring,
not just the unique ones - http://matt.might.net/articles/counting-hash-collisions/
The formulas for unique collisions are of special importance to determine optimal
replica/RS recovery block counts in data archival systems with stochastic replica
dispersal.