rain-1/Modelling an Uncertain World.md

## Modelling an Uncertain World.md

      
    Raw
  

              Modelling an Uncertain World.md
            
          
    I have included working code examples that can be run throughout, as well as graphs. I hope this helps make this easier to understand in a more hands on way.
The setup

Suppose you know that there are 10 balls in an urn, some are red and some are blue. So there are 11 different possible models for this situation:

M0: 0 red, 10 blue
M1: 1 red, 9 blue
...
M10: 10 red, 0 blue

Initially we do not know which situation we are in. So a reasonable thing to do would be to assign an equal probability to every model. This is the maximum entropy principle and we use it to set up our prior probability distribution.
Later on we will have learned new information and changed out list of 11 probabilities to more accurately reflect what we have learned. This will enable us to hone in a more focused distribution around one or two models.
size = 10
m = [(i, 1/(size+1)) for i in range(size+1)]
Calculating the forward probability of an event

Let an event be that we pull a ball out, check if it is red or blue, then put it back in. We will call events draws.
What is the probability of each? 1/2 right? This turns out to be true, but later we will not have all models equally likely. So how do we calculate probabilities then?

$P(\text{red} | M_0) = 0$
$P(\text{red} | M_1) = 1/10$
...
$P(\text{red} | M_{10}) = 10/10$

and we know the probability of each model so we can sum it up

$P(\text{red}) = \sum_{i} P(M_i) P(\text{red} | M_i)$

def p_red():
    return sum([p * i/size for (i,p) in m])

def p_blue():
    return 1 - p_red()

p_red()
0.5
A Joke


A mathematician, a physicist, and an engineer are riding a train through Scotland.


The engineer looks out the window, sees a black sheep, and exclaims, "Hey! They've got black sheep in Scotland!"


The physicist looks out the window and corrects the engineer, "Strictly speaking, all we know is that there's at least one black sheep in Scotland."


The mathematician looks out the window and corrects the physicist, " Strictly speaking, all we know is that is that at least one side of one sheep is black in Scotland."

Calculating the backwards probability of a model, given an event

Now we can get to the heart of the problem.
Suppose we perform an event (we take a ball out, look at it and put it back). We learn a couple things. If the ball red is we learn that there is at least one red ball in the urn. This is significant - it means we can completely eliminate model M0. In other words we can assign it probability 0. What probabilities will we assign to the rest of the models? 1/10 seems like a good option. But in fact we drew a red ball, so perhaps it would be reasonable to lean slightly towards red more than blue.
We can use Bayes theorem to work out the posterior probabilities for each model.
$$P(M_i | \text{red}) = \frac{P(\text{red} | M_i) P(M_i)}{P(\text{red})}$$
def p_model_given_red(i):
    return i/size * m[i][1] / p_red()

def p_model_given_blue(i):
    return (1 - i/size) * m[i][1] / p_blue()

[p_model_given_red(i) for i in range(size+1)]

[0.0,
 0.018181818181818184,
 0.03636363636363637,
 0.05454545454545454,
 0.07272727272727274,
 0.09090909090909091,
 0.10909090909090909,
 0.12727272727272726,
 0.14545454545454548,
 0.16363636363636364,
 0.18181818181818182]
Here is a graph of the new probability distribution:

You can see that 0 reds has 0 chance, and all reds is preferred as the most likely model.
What if we drew a red then a blue?
def update_given_red():
    return [(i, p_model_given_red(i)) for i in range(size+1)]

def update_given_blue():
    return [(i, p_model_given_blue(i)) for i in range(size+1)]

m = update_given_red()
m = update_given_blue()
m

[(0, 0.0),
 (1, 0.05454545454545454),
 (2, 0.09696969696969698),
 (3, 0.12727272727272723),
 (4, 0.14545454545454545),
 (5, 0.1515151515151515),
 (6, 0.14545454545454545),
 (7, 0.12727272727272726),
 (8, 0.09696969696969694),
 (9, 0.05454545454545453),
 (10, 0.0)]
Here is the graph

As is universal in statistics, a bell curve starts to appear.
Answering a question

This was all inspired by a question. What if we drew a red ball 6 times in a row, and then our friend came along and drew a blue. How surprised would be we? Should we accuse them of cheating?
m = [(i, 1/(size+1)) for i in range(size+1)]
m = update_given_red()
m = update_given_red()
m = update_given_red()
m = update_given_red()
m = update_given_red()
m = update_given_red()

m
[(0, 0.0),
 (1, 5.054576792921572e-07),
 (2, 3.234929147469806e-05),
 (3, 0.00036847864820398234),
 (4, 0.002070354654380676),
 (5, 0.007897776238939953),
 (6, 0.02358263348505487),
 (7, 0.05946659051104296),
 (8, 0.13250269788036326),
 (9, 0.26862093454070324),
 (10, 0.505457679292157)]

p_blue()
0.08611103388841013
I'm getting a result of 8%, so a bit less than 1/10. It's believable.
But if we were to tip out the urn and see it only had 1 red ball in it, that would be less than 1 in a million chance, and we'd be very surprised about that.

What's the bigger picture

This was a very simple example of the general concept of an agent performing bayesian reasoning to create a model of the world through uncertainty. This is one the foundational steps that is required for accurate decision making to take place. The concepts here should apply in a very wide range of situations.
Probabilities are fundamentally a subjective internal valuation of an agents belief. probabilites are not objective. They are based on the agents personal model of the world which is formed by the information they have recieved over time.
The key step that enabled us to work iterate and refine our models here was the application of Bayes theorem to go from forward reasoning to backward reasoning. This let us compute probabilities for potential models of the world based on evidence or events that were observed.
An agent using this type of bayesian reasoning is able to work on partial knowledge of the universe it exists within, but still do its best based on that. Another fundamental concept is that for anything to get started in the first place we needed to set up a prior probability distribution. To do that the concept of entropy was used.
These concepts apply much more generally to any agent that aims to operate intelligently when it does not have absolute perfect information. For more I recommend the book Information Theory, Inference, and Learning Algorithms - David J.C. MacKay. Also the wumpus chapter of AIMA.

https://www.inference.org.uk/itprnn/book.pdf


## red blue.png

      
    Raw
  

              red blue.png
            
          
## red red red red red red.png

      
    Raw
  

              red red red red red red.png
            
          
## red.png

      
    Raw
  

              red.png