stevenpollack/bogosort.md

## bogosort.md

      
    Raw
  

              bogosort.md
            
          
    How long would we have to wait to Bogosort a deck of cards?

Background

Bogosort
is my favorite sorting algorithm. Its idea is simple. Take the things you want to sort, randomly shuffle them, and then check to see if they're sorted. I'm gonna show you how once your set of to-be-sorted objects gets "large" enough (i.e. the size of a standard deck of
cards), the probabilistic guarantees of Bogosort render it only useful to those with either

A serious amount of faith, or
A serious gambling problem.

Anyone who knows anything about probabilities should stop reading right now, and recognize that
the analysis you want to be running will use
Geometric Random Variables
to model a person's effort at bogosorting a deck of cards. Anyone who doesn't know about random variables should skip the remainder of this section.
If you use Geometric random variables for this analysis, you'll find that
with 1E69 attempts, you're guaranteed at least 1 proper sorting with > 99% probability. However,
this could also mean you (luckily) achieve 2, 7, 16, or god knows how many successful sorting
attempts. This analysis is a worst-case scenario. For an average analysis, you'd just use the
mean of your Geometric random variable, and hence you'd need
average attempts = (1-1/52!) * 52! ~ 8.06581752 * 10^67 attempts

on average before you successfully Bogosort a deck of cards.
Let's start from the top down.

Say, you had a coin whose chance of flipping heads was 30%. If we flipped the coin once,
our chance for heads would be 30%. No questions. What about if we flipped twice? Well,
we can assume that the outcome of the first flip has no effect on the outcome of the
second, so we can just multiply the probabilities:
P(flip_1 = H) * P(flip_2 = H) = 30% ^ 2 = 9%

Easy enough, but what if we wanted to know the chance that out of two flips, we see at
lease one head? There are lots of strategies we could employ here, but I think there's
a clear one: let's tilt our heads and think about things different. Why don't we
try and answer the following question

What's the chance we only flip tails?

This is answered in the exact same way as our first question:
P(flip_1 = T) * P(flip_2 = T) = [1 - P(flip_1 = H)] * [1 - P(flip_2 = H)] = 49%

But, at this point, we're basically done. Since, in the space of outcomes, we know
that the opposite of seeing 2 tails, is seeing at least 1 heads. Hence,
P(at least one heads) = 1 - P(two tails) = 100% - 49% = 51%

The probabilistically inclined reader should find this all pretty well trodden,
as I'm just manually showing you tricks to manipulate
Binomial Random Variables.
In particular, I'm treating our thought experiment here as a Binomial random variable
with 2 trials, and probability of success set to 30% (naturally, our coin is actually
a Bernoulli Random Variable).
Links and jargon aside, there's a simple formula at work here, when we want to evaluate
the chance of seeing at least one heads (success), out of n coin flips:
1 - P(all flips are tails) = 1 - Product( 1 - P(flip_i = T), i = 1 .. n )

E.g., for n = 3, our formula looks like
1 - P(flip_1 = T)*P(flip_2 = T)*P(flip_3 = T)

As you might be able to imagine, this is really annoying for anyone (mathematicians,
included) to type, so what we do is we recognise that
P(flip_1 = T) = P(flip_2 = T) = ...

and we just let q stand for P(flip_i = T), and in particular, we see that if
p = P(flip_1 = H), then q = 1 - p, and hence our formula can be stated in terms
of the probability of success:
P(at least one H, given n flips) = 1 - Product( q, i = 1 .. n )
                                 = 1 - q ^ n
                                 = 1 - (1 - p) ^ n

The implication of this formula is pretty profound, actually. Why? Let's say our coin
actually represents the chance of contracting HIV, given unprotected sex with an infected
individual. Naturally, p is very low for vanilla, vaginal intercourse
(it's estimated at 4/10,000 = 0.04%);
however, the fact that p is low means that q = 1 - p (= 99.96%) is
very high. In particular, we can now use our formula to analyse our chances of contracting
HIV, given the number of (unprotected) interactions we've had:
P(n) =  P(contracted HIV, given n interactions) = 1 - 99.96% ^ n

which, to give you some perspective looks like


n
P(n)


0
0%


5
0.1998%


10
0.399%


15
0.5983%


20
0.797%


25
0.995%


30
1.193%


Thus, by your 26th interaction (again, this is with an infected individual), you've
accumulated something around a 1% chance of having contracted HIV. Realistically, 1% odds
are low, but non-trivial, and all of this came from an event that originally had a 0.04%
chance of occurring. Obviously, this can be more scary when we analyze events that have
non-trivial originating probabilities. For example: in June, July, and August, the average
amount of "rainy" days per month is 7 (source wikipedia). A demonstrative (albeit wrong) use of this statistic could come from the
following:

Let's assume that the weather on a given summer day doesn't affect the weather of the
day after.
There are 30 days in June, 31 in July, and 30 in August, so on average there are 21 rainy
days across a 91 day season. Hence, our chance of a rainy day is 21/91 ~ 23%.

So, now the question is, in an arbitrary week, what's the chance of at least 1 rainy day?
1 - 77% ^ 7

That is, approximately 84%. That's pretty huge.
Back to Bogosort

So, how does this all related to Bogosort? Well, if we wanted to bogosort a sequence of n
objects (for example, a deck of 52 cards), then we know that there is only 1 successful
sequence. However, there are
n! = n*(n-1)*(n-2)*...*2*1

ways to order n objects, and if you don't believe me, let's take a simple 3 object set, {1,2,3}, and enumerate all the orderings

1,2,3
1,3,2
2,1,3
2,3,1
3,1,2
3,2,1

Only one of these orderings is "sorted" (in the ascending sense) -- the first one -- hence
we have a 1/6 chance of getting the ordering right on the first go. But, if we wanted to
try bogosorting our deck of cards... Well, I'm not gonna try enumerating all the ways to
do it (not to mention, we'd have to agree on a way to order the clubs, spades, diamonds and
hearts), but just note that 52! is larger than 8 x 10^67 (80 unvigintillion). This means that our chance of randomly sorting the deck is
so low, my computer cannot differentiate it between 0. For all practical purposes, it is
zero. However, we shouldn't be discouraged, because computing power is cheap! We can still
do a back-of-the-envelope calculation to estimate how many attempts we'd have to try inorder
to get into a reasonable probability of success:


Recognizing that 1/52! ~ 0, means we need to change the problem a bit. Instead of 52!, let's just look at 52*51*50*...*33*32 (let's call this x). This is a lot smaller than 52!, but it'll give us some understanding of the problem at hand, here.


Just fiddling with some exponents, and wolframalpha, we see that
(1 - 1/x ^ 1E34) ~ 36.1%

where I'm using 1E34 as scientific notation for 1 followed by 34 zeroes (10^34).


What does this mean? Well, given 52! is around x * 1E34 (i.e., our original number
is still 10 000 000 ... 000 times bigger than our approximation), it would seem that we
need something like 1E68 attempts at bogosorting our deck before we get a reasonable
chance at having successfully sorted it (at least once). Turns out, though, we're on the
right track:
P(at least one successful sort, given 1E69 attempts) = 1 - (1 - 1/52! ^ 1E69) = 99.9996%


Which means that while it's certainly not impossible for bogosort to return a properly sorted
deck of cards, if we only have to spend one CPU operation to shuffle the deck, and we attempt
to perform this on my 2.66 GHz Intel processor, we'd have to wait for
10^69 [ops] / ( (2.66 * 10^9) [ops/sec] * (3.1536 * 10^9) [sec/century] ) = 1.19 * 10^50 [century]

That's right, over 1E50 centuries.
So, really, what does this mean? Bogosorting anything as "large" as a deck of cards is a
great exercise in comedy, but little else.