Instantly share code, notes, and snippets.

# anrieff/birthdays.py

Last active April 29, 2018 20:05
Show Gist options
• Save anrieff/ba35e24cac84c365cbd334b2a48ceaed to your computer and use it in GitHub Desktop.
Extended birthday problem: probability of M people sharing a birthday out of a group of N.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
 #!/usr/bin/python """ Compute the probability of M people sharing a birthday out of a group of N people. The parameter `exact' determines whether we want to ignore clusterings of more than M people with the same birthday (thus the largest group of coincident birthdays is exactly M). If it's false, we allow "M or more". How it works: we define a recursive counting function f(days, people, fulfilled) which tells us how many configurations of the desired clustering of birthdays can be generated if we have days (0..365) : number of days left of the year to play with, people (0..N) : number of persons we haven't assigned a birthday yet, fulfilled (0/1): tells us whether the predicate (M coincident birthdays) has been already fulfilled. If we're given with a particular function invocation with days > 0 and people > 0, we can assign any number of people to have a birthday on the next following day of the year. After that we'd be left with one day less, and potentially less people. I.e., if we were to have p persons (0 <= p <= people) sharing a birthday on that day, then we have C(people, p) * f(days - 1, people - p, fulfilled | (p are enough)) successful combinations in that scenario. C(n, k) is the binomial coefficient (n over k). (p are enough) is a predicate that depends on whether we want to be exact. If exact is true, (p is enough) ::= (p == M) and we need to ensure p <= M. if it's false, (p is enough) ::= (p >= M). The resultant probability is computed as f(365, N, 0) / 365**N. Please note that the function is recursive and needs to be memoized to run efficiently. Also the number of combinations may be huge (e.g., the number of 4+ coincident birthdays over a group of 60 people is has 152 digits). Back-of-the-envelope time complexity of the algorithm is O(N^3 logN) for the "M+" case. The "M" case is slightly better at O(N^2 * M * logN). """ N = 60 # num people M = 4 # num coincident birthdays exact = False # False: compute for "M+" coincident birthdays. # True: "M" (and no more) coincident birthdays # Compute the Pascal's triangle. C = [] for i in xrange(N + 1): row = [1] for j in xrange(1, i): row.append(C[i - 1][j - 1] + C[i - 1][j]) row.append(1) C.append(row) # Memoization cache cache = {} # Counting function, as described. def f(days, people, fulfilled): if days == 0 and people > 0: return 0 if people == 0: return int(fulfilled) # memoization: global cache key = (days, people, fulfilled) if key in cache: return cache[key] result = 0 for p in xrange(people + 1): this_does_it = (p == M) if exact else (p >= M) if exact and p > M: break result += C[people][p] * f(days - 1, people - p, fulfilled or this_does_it) # memoization: cache[key] = result return result numerator = f(365, N, False) denominator = 365**N print "Exact probability is %d/%d" % (numerator, denominator) print "Numerically: %.9f%%" % (numerator * 10**22 / denominator / 10.**20)

### npodonnell commented Apr 29, 2018

There's a much easier way.....