Skip to content

Instantly share code, notes, and snippets.

@grzanka
Last active March 23, 2020 19:01
Show Gist options
  • Save grzanka/684ff029a10df14a3585f22d833cd33f to your computer and use it in GitHub Desktop.
Save grzanka/684ff029a10df14a3585f22d833cd33f to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Bayesian Statistics for Beginners\n",
"Tadeusz Lesiak\n",
"\n",
"Based on the book \n",
">\"Bayesian Statistics for Beginners \n",
"\n",
">A Step-by-Step Approach\"\n",
"\n",
"><cite>Therese M.Donovan, \n",
"\n",
">Ruth M.Mickey</cite>"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
" Thomas Bayes (1701-1761)\n",
"\n",
"English mathematician and Presbyterian minister\n",
"\n",
"<div>\n",
"<img src=\"Bayes.png\" width=\"250\"/>\n",
"</div>\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Frequentist Probability (FP)\n",
"\n",
"* $$ P = \\frac{\\rm number~of~observed~outcomes~of~interest}{\\rm number~of~all~possible~outcomes~}$$\n",
"\n",
"\n",
"* **The conditional probability:** \"the probability of A, given that B occurs\": $~~~~P(A|B)= \\frac{P(A\\cap B)}{P(B)}$\n",
"\n",
"\n",
"* Usually $P(A|B)\\ne P(B|A)$\n",
"\n",
"\n",
"* $P(A\\cap B) = P(A|B) \\times P(B) = P(B|A) \\times P(A)$"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Bayes' Theorem (BT) and Bayesian Inference (BI)\n",
"\n",
"* **Bayes' Theorem describes the relationship between two inverse conditional probabilities P(A|B) and P(B|A)**\n",
"\n",
"\n",
"* the BT can be used to express how a degree of belief for a given hypothesis can be updated in light of new evidence\n",
"\n",
"\n",
"* The BI is the use of BT to draw conclusions about a set of mutually exclusive and exhaustive alternative hypotheses by linking prior knowledge about each hypothesis with new data - the result is updated probabilities for each hypothesis of interest."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"\n",
"## Bayes' Theorem\n",
"\n",
"There are two ways to think about BT:\n",
"\n",
"1. to describe the relationship between P(A|B) and P(B|A)\n",
"\n",
"\n",
"2. to express how a subjective degree of belief for a given hypothesis can be rationally updated to account for new evidence\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Bayes' Theorem to describe the relationship between P(A|B) and P(B|A)\n",
"\n",
"\n",
"$P(A|B)= \\frac{P(A\\cap B)}{P(B)}$ $~~~~\\Longrightarrow~~~~$ $P(A\\cap B) = P(A|B)\\times P(B)$ \n",
"\n",
"\n",
"$P(B|A)= \\frac{P(B\\cap A)}{P(A)}$ $~~~~\\Longrightarrow~~~~$ $P(B\\cap A) = P(B|A)\\times P(A)$\n",
"\n",
"\n",
"$P(A\\cap B) = P(A\\cap B)$ $~~~~\\Longrightarrow~~~~$ $P(A\\cap B) \\times P(B) = P(B\\cap A)\\times P(A)$\n",
"\n",
"\n",
"The Bayes' Theorem: $~~~~P(A|B)= \\frac{P(B|A)\\times P(A)}{P(B)}$\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Bayes' Theorem to describe the relationship between P(A|B) and P(B|A)\n",
"\n",
"![alt text](BT.png \"Bayes' Theorem\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"\n",
"### Example of Application of Bayes' Theorem \n",
"\n",
"![alt text](Bex1.png \"Bayes' Theorem\")\n",
"\n",
"* A - represent women with breast cancer $~~~~P(A) = 0.01~~~~$\n",
"* B - represent a positive test \n",
"* B|A - represents a positive test, given that a woman has breast cancer $~~\\Longrightarrow~~$ given as $~~~~P(B|A) = 0.8$\n",
"* A|B - represent women with breast cancer, given a posotve outcome of the test $~~\\Longrightarrow~~$ this is what we want to know\n",
"* The BT: $~~~~P(A|B)= \\frac{P(B|A)\\times P(A)}{P(B)}$ $~~\\Longrightarrow~~$ we need the P(B)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Example of Application of Bayes' Theorem Cont.\n",
"\n",
"* we need to fill the table:\n",
"\n",
"![alt text](Bex2.png \"Bayes' Theorem\")\n",
"\n",
"* $P(B\\cap A) = P(B|A)\\times P(A) = 0.8 \\times 0.01 = 0.008$\n",
"* $~~\\Longrightarrow~~$ $P(\\sim B\\cap A) = P(A) - P(B\\cap A) = 0.01 - 0.008 = 0.002$\n",
"\n",
"![alt text](Bex3.png \"Bayes' Theorem\")\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Example of Application of Bayes' Theorem Cont.\n",
"\n",
"* We also know B|~A - representing the cases when the test is positive under the condition that a woman does not have cancer\n",
" $~~~~P(B|\\sim A) = 0.096~~~~$\n",
" \n",
" * Thus $P(B\\cap \\sim A) = P(B|\\sim A)\\times P(\\sim A) = 0.096 \\times 0.99 = 0.095$\n",
" \n",
" * $~~\\Longrightarrow~~$ $ P(B) = P(B\\cap \\sim A) + P(B\\cap A) = 0.008+ 0.095 = 0.103$\n",
" \n",
" ![alt text](Bex4.png \"Bayes' Theorem\")\n",
" \n",
" \n",
" $~~~~P(A|B)= \\frac{P(B|A)\\times P(A)}{P(B)} = \\frac{0.8\\times 0.01}{0.103} = 0.0776$"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The \"second\" formulation of the Bayes' Theorem (2BT)\n",
"\n",
"* The alternative, equally valid way to express the BT:\n",
"\n",
"$P(A|B)= \\frac{P(B|A)\\times P(A)}{P(B)}$ $~~~~\\Longrightarrow~~~~$ $P(A|B)= \\frac{P(B|A)\\times P(A)}{P(A\\cap B) + P(\\sim A \\cap B)}$\n",
"\n",
"* This, last formulation of the BT, focuses on inference and has vast applications in science"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The \"second\" formulation of the Bayes' Theorem (2BT)\n",
"\n",
"* The scientific method consists of two types of inquiry:\n",
" 1. **induction (IN)**\n",
" 2. __deduction (DE)__\n",
" \n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The \"second\" formulation of the Bayes' Theorem (2BT)\n",
"\n",
" Illustration of the scientific process:\n",
"<div>\n",
"<img src=\"Bex41.png\" width=\"450\"/>\n",
"</div>"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The \"second\" formulation of the Bayes' Theorem (2BT)\n",
"\n",
" The __Bayesian inference (BI)__: \n",
"- the process of confronting alternative hypotheses with new data and using BT to update our beliefs in each hypothesis\n",
"- an approach hconcerned with the consequences of modifying our previous beliefs as a result of receiving new data\n",
"- a method of statistical inference in which BT is used to update the probability for a hypothesis as more evidence or information becomes available\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## How does Bayesian inference work ?\n",
"\n",
"* Let us go back to the formula: $P(A|B)= \\frac{P(B|A)\\times P(A)}{P(A\\cap B) + P(\\sim A \\cap B)}$\n",
"\n",
"* __The critical fact: the marginal probability of B is the sum of the joint probabilities that make it up__ (the denominator of the formula)\n",
"\n",
" ![alt text](Bex5.png \"Bayes' Theorem\")\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## How does Bayesian inference work ?\n",
"\n",
"* Using now the former example of breast cancer:\n",
"* Suppose we were asked to find the probability P(A|B) that a woman has a breast cancer (A), given \n",
" that her mammogram test came back positive (B) \n",
"* \"data\" - the results of the mammogram\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## How does Bayesian inference work ?\n",
"\n",
"* Let us now identify the parts of the problem (in $P(A|B)= \\frac{P(B|A)\\times P(A)}{P(A\\cap B) + P(\\sim A \\cap B)}$) in terms of the scientific method\n",
"\n",
"* we have two competing __hypotheses__ regarding cancer: the woman has cancer (A) vs she does not (~A)\n",
"\n",
"* we have __data__ for this problem: the test came back positive. So B represents our observed data (~B does not appear in the BT formula; sensible since we did not observe a negative test"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## How does Bayesian inference work ?\n",
"\n",
"* Let us now replace the __joint probabilities__ with their conditional probability equivalents:\n",
"* $P(A\\cap B)= P(B|A)\\times P(A)$\n",
"* $P(\\sim A\\cap B)= P(B|\\sim A)\\times P(\\sim A)$\n",
"* Then, the BT: $~~~~~~~~P(A|B)= \\frac{P(B|A)\\times P(A)}{P(B|A)\\times P(A) + P(B|\\sim A)\\times P(\\sim A)}$"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## How does Bayesian inference work ?\n",
"\n",
"* Note: the first term in the denominator is exactly the same as numerator\n",
"* this fact is clearly justified\" BT returns a proportion, or probability, ranging between 0 and 1"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The Bayesian, scientific procedure:\n",
"\n",
"<div>\n",
"<img src=\"Bex41.png\" width=\"200\"/>\n",
"</div>\n",
"\n",
"1. __Hypotheses or theory box__\n",
" - the hypotheses are identified; \n",
" - they must be mutually exclusive and exhaustive; \n",
" - we assign (guess) the probability that each individual hypothesis is true (prior to making an experiment). \n",
" - These are called __prior probabilities__ because they represent our current belief in each hypothesis *prior* to data collection \n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The Bayesian, scientific procedure:\n",
"\n",
"<div>\n",
"<img src=\"Bex41.png\" width=\"200\"/>\n",
"</div>\n",
"\n",
"2. __Consequences box__\n",
" - we write out equations for calculating the probability of observing the test data under each hypothesis\n",
" - this probability is called __likelihood__\n",
" - figuring out how to calculate the likelihood of the data is often the most challenging part of Bayesian inference\n",
" \n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The Bayesian, scientific procedure:\n",
"\n",
"<div>\n",
"<img src=\"Bex41.png\" width=\"200\"/>\n",
"</div>\n",
"\n",
"3. __Data box__\n",
" - we collect data; for the example of cancer, the test came back positive\n",
"\n",
"4. __Inference box__\n",
" - with data in hand, we can now plug our data into the likelihood equations:\n",
" - likelihood of observing the data (a positive test result) under the cancer hypothesis\n",
" - likelihood of observing the data (a positive test result) under the no-cancer hypothesis \n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The Bayesian, scientific procedure:\n",
"\n",
"* __Finally__ we use BT to determine a __posterior probability (PP)__ for each hypothesis\n",
"* The PP represents our updated belief in each hypothesis after new data are collected:\n",
" - probability of cancer, given the observed data\n",
" - probability of no cancer, given the observed data "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The Bayesian, scientific procedure:\n",
"\n",
"In the particular case of the cancer example, The BT reads:\n",
"\n",
"<div>\n",
"<img src=\"PP.png\" width=\"600\"/>\n",
"</div>\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The Bayesian, scientific procedure:\n",
"\n",
"Let us now replace each term with its Bayesian inference definition:\n",
"\n",
"<div>\n",
"<img src=\"BI.png\" width=\"600\"/>\n",
"</div>\n",
"\n",
"The \"second\" approach of BI places the problem within a scientific context, where one posits hypotheses and then update our beliefs in each hypothesis after data are collected"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Bayes' Theorem - the case of more than two hypotheses:\n",
"\n",
"Assume the discrete number (n) of hypotheses; then the BT reads:\n",
"\n",
"$$\\rm P(H_i|data) = \\frac{P(data|H_i) \\times P(H_i)}\n",
"{\\sum_{j=1}^nP(data|H_k) \\times P(H_k) }$$\n",
"\n",
"The essence of BI:\n",
"\n",
"**Initial belief in Hypothesis i + New Data $~~~~\\Longrightarrow~~~~$ updated belief in Hypothesis i** \n",
"\n",
"**updating our beliefs by acquiring more data = learning**\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The Author Problem - Bayesian Inference with Two Hypotheses:\n",
"\n",
"* 1964: *Frederick Mosteller* and *David Wallace* published an articlein which they studied the disputed authorship of some of the *Federalist Papers*:\n",
"\n",
"![alt text](AP01.png \"Bayes' Theorem\")\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The Author Problem - Bayesian Inference with Two Hypotheses:\n",
"\n",
"* Let us assume that we are working with a **specific** paper of unknown authorship (No 54)\n",
"* Let us apply BI\n",
"\n",
"\n",
"1. **we identify our hypotheses**\n",
"\n",
" - Hamilton = Hamilton's authorship hypothesis\n",
" - Madison = Madison's authorship hypothesis\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The Author Problem - Bayesian Inference with Two Hypotheses:\n",
"\n",
"* Note that bot hypotheses are exhaustive and mutually exclusive:\n",
"\n",
" - P(Hamilton) = P(~Madison)\n",
" - P(Madison) - P(~Hamilton\n",
" \n",
" - Hamilton = A \n",
" - Madison = ~A\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The Author Problem - Bayesian Inference with Two Hypotheses:\n",
"$P(A|B)= \\frac{P(B|A)\\times P(A)}{P(B|A)\\times P(A) + P(B|\\sim A)\\times P(\\sim A)}$\n",
"\n",
"$\\Longrightarrow$\n",
"${\\rm P(Hamilton|data)= \\frac{P(data|Hamilton)\\times P(Hamilton)}{P(data|Hamilton)\\times P(Hamilton) + P(data|Madison)\\times P(Madison)}}$\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The Author Problem - Bayesian Inference with Two Hypotheses:\n",
"\n",
"2. **we express our belief that each hypothesis is true in terms of prior probabilities**\n",
"\n",
" - P(Hamilton) = prior probability that the true author is Hamilton\n",
" - P(Madison) = prior probability that the true author is Madison\n",
" \n",
" \n",
"* there are plenty possible choices of prior probabilities\n",
"* Let us set e.g. P(Hamilton) = P(Madison) = 0.5\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The Author Problem - Bayesian Inference with Two Hypotheses:\n",
"\n",
"3. **Gather the data** - can be found in paper No 54 which is 2008 words long\n",
"\n",
" * It turned out that Madison tended to use the word **by** more frequently than Hamilton\n",
" * whereas Hamilton tended to use the word **to** more frequently than Madison\n",
" * the best single discriminant, however, was the use of the word **upon** - Hamilton used **upon** overwhelmingly greater frequency than Madison\n",
" * Many other measures, like sentence length, have been considered as well\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The Author Problem - Bayesian Inference with Two Hypotheses:\n",
"\n",
"3. **Gather the data** \n",
" * the word **upon** appeared twice in the paper in question; the rate \n",
" ${\\rm \\frac{\\#~upons}{total~words} = \\frac{2}{2008} = 0.000996 }$\n",
" * in other words:0.996 **upons** per 1000 words\n",
"\n",
"4. determine the **likelihood** of the observed data, assuming each hypothesis is true\n",
"\n",
" * determine P(0.996|Hamilton) and P(0.996|Madison)\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The Author Problem - Bayesian Inference with Two Hypotheses:\n",
"\n",
"* Likelihood vs probability: likelihood describes the probability of observing data that have already been collected (we look retrospectively at the probability of collecting those data)\n",
"* Likelihood is the hypothetical probability that an event has already occured, would yield a specific outcome\n",
"* Note: the likelihoods are conditional for each hypothesis - in this Bayesian analysis, the likelihood is interpreted as the probability of observing the data, given the hypothesis\n",
"* **Computing the likelihood of the observed data is a critical part of Bayesian analysis**\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The Author Problem - Bayesian Inference with Two Hypotheses:\n",
"\n",
"* our dataset is composed of 98 articles, 48 (50) of which are known to be penned by Hamilton (Madison), respectively\n",
"* The frequency histogram of the word upon (per 1000 words):\n",
"\n",
"![alt text](AP02.png \"Bayes' Theorem\")\n",
"\n",
"\n",
"* Intuitively, tha data are more consistent with the Madison hypothesis\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The Author Problem - Bayesian Inference with Two Hypotheses:\n",
"\n",
"* The same data in the tabular form: \n",
"\n",
"![alt text](AP03.png \"Bayes' Theorem\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The Author Problem - Bayesian Inference with Two Hypotheses:\n",
"\n",
"* Only one of Hamilton's 48 manuscripts has a rate of **upon** in the range [0,1]\n",
"* Therefore P(0.996|Hamilton) = 1/48 = 0.021\n",
"* Seven of Madison's 50 manuscripts has a rate of **upon** in the range [0,1]\n",
"* Therefore P(0.996|Madison) = 7/50= 0.14\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The Author Problem - Bayesian Inference with Two Hypotheses:\n",
"\n",
"5. Use BT to compute P(Hamilton|0.996) and P(Madison|0.996)\n",
"\n",
"${\\rm P(Hamilton|0.996)= \\frac{P(0.996|Hamilton)\\times P(Hamilton)}{P(0.996|Hamilton)\\times P(Hamilton) + P(0.996|Madison)\\times P(Madison)}}$\n",
"\n",
"${\\rm P(Hamilton|0.996)= \\frac{0.021 * 0.5}{0.021 * 0.5 + 0.14 * 0.5} =\n",
"\\frac{0.0105}{0.0805} = 0.1304}$\n",
"\n",
"Thus \n",
"$${\\rm P(Madison|0.996) = 0.8696}$$\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The Author Problem - Bayesian Inference with Two Hypotheses:\n",
"\n",
"* The prior and new posterior estimated can be graphed as follows:\n",
"\n",
"![alt text](AP04.png \"Bayes' Theorem\")\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The Author Problem - Bayesian Inference with Two Hypotheses:\n",
"\n",
"* What happens if we use a different set of priors e.g.\n",
"* P(Hamilton) = 0.75, P(Madison) = 0.25\n",
"* Then:\n",
"$${\\rm P(Hamilton|0.996)= \\frac{0.021 * 0.75}{0.021 * 0.75 + 0.14 * 0.25} =\n",
"\\frac{0.0105}{0.0805} = 0.3103}$$\n",
"\n",
"![alt text](AP05.png \"Bayes' Theorem\")\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The Author Problem - Bayesian Inference with Two Hypotheses:\n",
"\n",
"* **what if we found more papers known to be authorized by Hamilton and Madison?**\n",
"* The more information you have to calculate the likelihood, the better\n",
"* We would use this new information to get better estimatesof the probability of each author's use of the word **upon**\n",
"* Additionally, the discovery of more papers may influence our choice of priors\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The Author Problem - Bayesian Inference with Two Hypotheses:\n",
"\n",
"* **Do the likelihoods of the data have to add to 1.0?**\n",
"* No, one cannot confuse the likelihoods with the prior probabilities for a set of hypotheses\n",
"\n",
"![alt text](AP06.png \"Bayes' Theorem\")\n",
"\n",
"![alt text](AP07.png \"Bayes' Theorem\")\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The Birthday Problem - Bayesian Inference with Multiple Discrete Hypotheses:\n",
"\n",
"* Bobbie has completely forgotten the dat of his wife's birthday; he knows only the year: 1900\n",
"* The wife, Mary decided to leave Bobbie, unless he find this date - at least the Month\n",
"* **Our task: to use a Bayesian inference approach to determine the month in which Mary was born**\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The Birthday Problem - Bayesian Inference with Multiple Discrete Hypotheses:\n",
"\n",
"To begin with, we have n = 12 discrete hypotheses\n",
"$$\\rm P(H_i|data) = \\frac{P(data|H_i) * P(H_i)}\n",
"{\\sum_{j=1}^nP(data|H_k) * P(H_k) }$$\n",
"\n",
"1. **identify your hypotheses:**\n",
" * born in January = January hypothesis\n",
" * born in February = February hypothesis\n",
" * etc\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The Birthday Problem - Bayesian Inference with Multiple Discrete Hypotheses:\n",
"\n",
"2. **Express our belief that each hypothesis is true in terms of probabilities**\n",
" * P(January) = prior probability that Mary's true birth month is January\n",
" * P(February) = prior probability that Mary's true birth month is February\n",
" * etc. \n",
" \n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The Birthday Problem - Bayesian Inference with Multiple Discrete Hypotheses:\n",
"\n",
"2a. **non-informative prior (NIP)** - equal probabilities for all hypotheses\n",
" * the distribution of priors vs months is flat: expresses \"vague or general information about a variable\"\n",
" * The NIP adds little or no information to the Bayesian inference; it does not have an impact the posterior distribution\n",
" * When an analyst uses a NIP, the goal is to obtain a posterior distribution that is shaped primarily by the likelihood of the data\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The Birthday Problem - Bayesian Inference with Multiple Discrete Hypotheses:\n",
"\n",
"2b. **Informative prior (IP)**\n",
" * The IP is not \"flat\" i.e. it is not dominated by the likelihood, it adds information to the Bayesian inference and it has an impact on the posterior distribution\n",
" * When an analyst uses an IP ==> the goal is to obtain a posterior distribution that is shaped by both the prior and the likelihood of the data\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The Birthday Problem - Bayesian Inference with Multiple Discrete Hypotheses:\n",
"\n",
"* The non-informative prior:\n",
"<div>\n",
"<img src=\"BP01.jpg\" width=\"200\"/>\n",
"</div>\n",
"\n",
"* The informative prior (Bobby had some hints to believe that February and May are more likely that the rest of the year):\n",
"<div>\n",
"<img src=\"BP02.jpg\" width=\"200\"/>\n",
"</div>"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The Birthday Problem - Bayesian Inference with Multiple Discrete Hypotheses:\n",
"\n",
"3. **Collect the data**\n",
" * In this case the dat would comprise the information how frequent was the name Mary in each individual month of the year 1900\n",
" \n",
" * The BT:\n",
" $\\rm P(H_i|data) = \\frac{P(data|H_i) * P(H_i)}\n",
"{\\sum_{k=1}^nP(data|H_k) * P(H_k) }$ \n",
" * for e.g. the January hypothesis: $\\rm P(January|1Mary)= \\frac{P(1Mary|January) * P(January)}\n",
"{\\sum_{j=1}^nP(1Mary|H_k) * P(H_k) } $\n",
" * here *1Mary* means the likelihood of observing the data for each montly hypothesis"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The Birthday Problem - Bayesian Inference with Multiple Discrete Hypotheses:\n",
"\n",
"\n",
"4. **Estimate the likelihood of observing the data** i.e. estimate the *1Mary* likelihood\n",
"\n",
" * Let us assume that the following data (frequency histogram of Marys per month) are available\n",
" \n",
" <div>\n",
"<img src=\"BP03.jpg\" width=\"400\"/>\n",
" </div>\n",
" \n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The Birthday Problem - Bayesian Inference with Multiple Discrete Hypotheses:\n",
"\n",
"5. **Use the BT to compute the posterior probabilities P(January|1Mary), P(February|1Mary) etc**\n",
"\n",
"$$\\rm P(January|1Mary)= \\frac{P(1Mary|January) * P(January)}\n",
"{\\sum_{j=1}^nP(1Mary|H_k) * P(H_k) } $$\n",
"\n",
"\n",
" <div>\n",
"<img src=\"BP04.jpg\" width=\"400\"/>\n",
" </div>\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The Birthday Problem - Bayesian Inference with Multiple Discrete Hypotheses:\n",
"\n",
"* Prior distribution and posterior distribution:\n",
"\n",
"Informative | Non-informative\n",
"-:| -: \n",
"![alt](BP05.jpg) | ![alt](BP06.jpg)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The Birthday Problem - Bayesian Inference with Multiple Discrete Hypotheses:\n",
"\n",
"* The left plot is an example of a **prior sensitive analysis**\n",
"* In Bayesian analysis and scientific deduction, a primary goal of the analyst is to collect data that will discriminate the hypotheses\n",
"* The tricky part comes into play when one really doesn't have any information to set the prior and is trying to be as objective as possible\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The Birthday Problem - Bayesian Inference with Multiple Discrete Hypotheses:\n",
"\n",
"* \"flat priors are not necessarily non-informative, and non-informative priors are not necessarily flat\"\n",
"* all priors are in fact subjective because the analyst must select one and, in doing so, exercises subjectivity\n",
"* **Happy end:** eventually, Bobby remembered that he took Mary to a play for her birthday. After a bit of detective work, he determined Mary's birthday way May 8th.\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The Portrait Problem - Bayesian Inference with Joint Likelihood\n",
"\n",
"* Let us discuss how to combine **multiple sources of data**\n",
"\n",
"* A ficticious problem: how to determine the probability that the man in the photo is Thomas Bayes?\n",
"\n",
"<div>\n",
"<img src=\"PB01.jpg\" width=\"400\"/>\n",
" </div>"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The Portrait Problem - Bayesian Inference with Joint Likelihood\n",
"\n",
"\n",
"1. **identify the hypotheses**\n",
"\n",
"\n",
" * there are just two hypotheses: \n",
" * the portrait is of Thomas Bayes\n",
" * the portrait is not of Thomas Bayes\n",
" \n",
"2. **what are the prior probabilities that each hypothesis is true?**\n",
"\n",
"\n",
" * we set the priors 50/50\n",
" * 0.5 = P(\"Thomas Bayes hypothesis\")\n",
" * 0.5 = P(Not Thomas Bayes hypothesis\")\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The Portrait Problem - Bayesian Inference with Joint Likelihood\n",
"\n",
"3. **What are the data?**\n",
" * Consider two kinds of data\n",
" * the frequency of usage of **wigs** by the ministers in 1700's:\n",
" - people on the set of similar portraits will be known as ministers or not and will wear a wig or not\n",
" - the variable is discrete: 1 = true, 0 = not\n",
" * the **similarity index**\n",
" - spanning the range (0,100) (0-no similarity, 100 - total similarity); \n",
" - one can measure characteristics such as eyebrow shape, nose length, forehead length etc for Bayes's close relatives and random persons\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The Portrait Problem - Bayesian Inference with Joint Likelihood\n",
"\n",
"* **collect the data**\n",
"\n",
" * From the portrait itself we get:\n",
" - wigs = 0\n",
" - similarity = 55 (from the comparison of the portrait under study and the one of Bayes's brother Joshua)\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The Portrait Problem - Bayesian Inference with Joint Likelihood\n",
"\n",
"4. **what is the likelihood of the observed data under hypothesis**\n",
"\n",
" * we have **two sources of data** \n",
" * we **assume that each piece of information is independent of the other**\n",
" * we can **calculate the likelihood of observing each individual piece of data under each hypothesis**\n",
" * once we have the two likelihood calclations, we can **compute the joint likelihood for each hypothesis**\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The Portrait Problem - Bayesian Inference with Joint Likelihood\n",
"\n",
"* The dataset1 for wigs is composed of a 100 portraits, with the results as follows:\n",
"\n",
"<div>\n",
"<img src=\"PB02.jpg\" width=\"400\"/>\n",
" </div>\n",
" \n",
"$\\Longrightarrow~~$ **how likely it is to observe in our data no wig, under the Thomas Bayes hypothesis = 2/100 = 0.02** (among all the portraits we choose only those with a man who is a minister and does wear a wig)\n",
"\n",
"$\\Longrightarrow~~$ **how likely it is to observe in our data no wig, under the Not Thomas Bayes hypothesis = 77/100** (the probability that a man on the portrait will not wear a wig) \n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The Portrait Problem - Bayesian Inference with Joint Likelihood\n",
"\n",
"* the dataset2 is composed of the pairs of males of two kinds: father and sons (1) and unrelated ones (0)\n",
"* For each pair the similarity index is determined, like e.g.\n",
"\n",
"<div>\n",
"<img src=\"PB03.jpg\" width=\"400\"/>\n",
" </div>\n",
" \n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The Portrait Problem - Bayesian Inference with Joint Likelihood\n",
"\n",
"* we can split the samples of related and unrelated pairs and plot the similarity distributions together with thobserved similarity factor:\n",
"\n",
"<div>\n",
"<img src=\"PB04.jpg\" width=\"400\"/>\n",
" </div>"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The Portrait Problem - Bayesian Inference with Joint Likelihood\n",
"\n",
"* **the likelihood of observing a similarity score of at least 55 under the Thomas Bayes hypothesis = the fraction of the GREEN plot on the right of the value of 55 = 0.69**\n",
"\n",
"\n",
"* **the likelihood of observing a similarity score of at least 55 under the NOT Thomas Bayes hypothesis = the fraction of the RED plot on the right of the value of 55 = 0.01**"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The Portrait Problem - Bayesian Inference with Joint Likelihood\n",
"\n",
"* **The combination of both results into one likelihood for each hypothesis**\n",
"\n",
" * if our two data sources are **independent** we can simply multiply the two likelihood components together\n",
" \n",
" * the likelihood of observing the data under the Thomas Bayes hypothesis:\n",
" \n",
" 0.02 * 0.69 = 0.0138 \n",
" \n",
" * the likelihood for observing the data under the Not Thomas Bayes hypothesis: \n",
" 0.77 * 0.01 = 0.0077 "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## The Portrait Problem - Bayesian Inference with Joint Likelihood\n",
"\n",
"* the results from BT:\n",
"\n",
"table | histogram\n",
":---:| :---: \n",
"![alt](PB05.jpg) | ![alt](PB06.jpg)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Probability Functions\n",
"\n",
"### Probability Mass Functions (pmf)\n",
"\n",
"### Probability Density Functions (pdf)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Random Variable (RV)\n",
"\n",
"* RV - when the value of a variable which is subjected to random variation, OR\n",
"* when it is the value of a randomly chosen member of a population\n",
"\n",
"* RV is a function"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Probability Functions\n",
"\n",
"* we are often interested in knowing the probability of observing particular outcomes\n",
"* the outcome of a random event cannot be determined before it occurs, but it may be any one of several possible outcomes\n",
"* The actual outcome is considered to be determined by chance"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Probability Mass Function (pmf)\n",
"\n",
"* a pmf is a function that gives the probability that a discrete random variable is exactly equal to some value"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Binomial pmf\n",
"* all binomial problems are composed of trials that have only two possible outcomes: \"success\" and \"failure\"\n",
"* The mpf is widely used for problems where there are a fixed number of independent trials (designated n) and where each trial can have only one of two outcomes"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Binomial pmf\n",
"\n",
"* the binomial pmf:\n",
"$f(y;n,p) = \\left( \\begin{array}{c} n \\\\ y \\end{array} \\right) p^y (1-p)^{(n-y)},~~ y = 1, 2, \\ldots n,~~~~\\left( \\begin{array}{c} n \\\\ y \\end{array} \\right)= \\frac{n\\!}{y\\!(n-y\\!)}$\n",
"* Parameters:\n",
" * n - the total number of trials\n",
" * p - the probability of success\n",
"* y - the 3rd input - the observed number of successes in the experiment "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Binomial pmf\n",
"\n",
"* assumptions\n",
" 1. the trials are independent\n",
" 2. there are two possible outcomes (success or failure) on each trial\n",
" 3. the probability of success is constant across trials"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### The \"other\" pmfs\n",
"\n",
"* Negative bimomial distribution\n",
"* Bernoulli distribution\n",
"* Poisson distribution\n",
"* Discrete uniform distribution\n",
"* Geometric distribution\n",
"* Hypergeometric distribution\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Bernoulli distribution\n",
"\n",
"* a special case of a binomial distribution, in which the number of trials is n=1\n",
"$$f(y;1,p) = p^y (1-p)^{(n-y)}$$"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Reminder\n",
"\n",
"<div>\n",
"<img src=\"BP11.jpg\" width=\"600\"/>\n",
" </div>"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Reminder\n",
"* Note that the likelihoods are conditional on each hypothesis\n",
"* In Bayesian analysis, the likelihood is interpreted as the probability of observing the data, given the hypothesis\n",
"* the notation: ${\\cal L}({\\rm data;H})$ or ${\\cal L}({\\rm data~|~H})$\n",
"* Likelihoods describe the probability of observing data that have already been collected\n",
"* The likelihood computations do not need to sum to 1\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Example\n",
"\n",
"* Example: the likelihood of the binomial pmf for the experiment in which we do not know p, and we were given 2 heads out of 3 coin flips\n",
"<div>\n",
"<img src=\"BP12.jpg\" width=\"200\"/>\n",
" </div>"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Example cont.\n",
"* In fact, p can assume any value in the range of 0 to 1, so there are an infinite number of possibilities\n",
"* the full spectrum of p alternatives = a **likelihood profile** of the binomial function when n=3 and y=2\n",
"<div>\n",
"<img src=\"BP13.jpg\" width=\"400\"/>\n",
" </div>"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Example cont.: the Bayes' Theorem:\n",
"\n",
"$$\\rm P(H_i|data)= \\frac{P(data|H_i) * P(H_i)}\n",
"{\\sum_{k=1}^n P(data|H_k) * P(H_k) } $$\n",
"1. Hypotheses - just two for our coin in terms of fairness:\n",
" * $H_1$ - the coin is fair so that the probability of heads p = 0.5\n",
" * $H_2$ - the coin is weighted so thath p = 0.4\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Example cont.: the Bayes' Theorem:\n",
"\n",
"2. the prior probabilities for each hypothesis\n",
" * Let us set the prior probability for each hypothesis = 0.5\n",
" <div>\n",
"<img src=\"BP14.jpg\" width=\"400\"/>\n",
" </div>\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Example cont.: the Bayes' Theorem:\n",
"\n",
"3. Collect data:\n",
" * let us assume that we tossed a coin 3 times and ended up with 2 heads\n",
"4. Compute the likelihood of the data under each hypothesis\n",
" * For the $H_1$ (p=0.5): $~~{\\rm P(data|H_1)} = 0.375$\n",
" * For the $H_2$ (p=0.4): $~~{\\rm P(data|H_2)} = 0.288$\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Example cont.: the Bayes' Theorem:\n",
"\n",
"5. Use the BT to update the priors to posteriors:\n",
"\n",
"$\\rm P(H_1|data)= \\frac{P(data|H_1) * P(H_1)}\n",
"{P(data|H_1) * P(H_1) + P(data|H_2) * P(H_2) } = \\frac{0.375*0.5}{0.375*0.5 + 0.288*0.5} = 0.566 $\n",
"\n",
"$\\rm P(H_2|data)= \\frac{P(data|H_2) * P(H_2)}\n",
"{P(data|H_1) * P(H_1) + P(data|H_2) * P(H_2) } = \\frac{0.288*0.5}{0.375*0.5 + 0.288*0.5} = 0.434 $\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Example cont.: the Bayes' Theorem\n",
" * This comprises the update of our belief that the coin is fair from 0.5 to 0.556\n",
" * and the update of our belief that the coin is biased from 0.5 to 0.434\n",
" * The resulting posterior distribution:\n",
" <div>\n",
"<img src=\"BP15.jpg\" width=\"400\"/>\n",
" </div>"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Example cont.: the Bayes' Theorem\n",
"* The Kruschke diagram - intended to communicate the structure of the prior and likelihood\n",
"<div>\n",
"<img src=\"BP16.jpg\" width=\"400\"/>\n",
" </div>"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Example cont.: the Bayes' Theorem\n",
"* The above example illustrates for the first time the application of the pmf to calculate the likelihood of the data under each hypothesis"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Probability density functions\n",
"\n",
"* when a random variable is continuous\n",
"* example: uniform distribution U:\n",
" $~~~~~~f(x) = 0.5,~~ 4 \\le x \\le 6~~ X\\sim U(4,6)$\n",
"* the area under the pdf f(x) must be equal 1.00 \n",
"* the formal definition of U(a,b):\n",
"$$f(x;a,b) = \\frac{1}{b-a}$$"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Probability density functions\n",
"\n",
"* example cont: \n",
"* the probability of 4.5 < x < 5.5 ==> 0.5\n",
"* the probability of x=5 ===> 0"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Probability density functions: Gaussian\n",
"\n",
"* two parameters:\n",
" 1. location parameter\n",
" 2. scale parameter\n",
"\n",
"$$f(x;\\mu, \\sigma) = \\frac{1}{\\sqrt{2\\pi\\sigma}} e^{-(x-\\mu)^2/2\\sigma^2}$$"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### The other pdfs\n",
"\n",
"* normal\n",
"* log-normal\n",
"* beta\n",
"* gamma\n",
"* exponential\n",
"* Weibull\n",
"* Cauchy"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### The BT and pdfs\n",
"\n",
"* let us consited the BT when the hypotheses for a (single) parameter $\\theta$ are infinite\n",
"\n",
"$$\\rm P(H_i|data) = \\frac{P(data|H_i) * P(H_i)}\n",
"{\\sum_{k=1}^nP(data|H_k) * P(H_k) } ~~==>~~\n",
"P(\\theta|data) = \\frac{P(data|\\theta) * P(\\theta)}\n",
"{\\int P(data|\\theta) * P(\\theta) d\\theta }$$\n",
"* Two parameters --> the likelihood surface"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Probability density functions: Gaussian example\n",
"\n",
"* Consider the average lifespan ($\\mu$) of a bacterium \n",
"* Let as fix $\\sigma = 0.5$\n",
"* Then the BT: \n",
"$$\\rm P(\\mu|data) = \\frac{P(data|\\mu) * P(\\mu)}\n",
"{\\int P(data|\\mu) * P(\\mu) d\\mu }$$"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Probability density functions: Gaussian example\n",
"\n",
"1. the hypotheses:\n",
" * there are an infinite number of hypotheses\n",
" * we could have some bounds though"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Probability density functions: Gaussian example\n",
"\n",
"2. Prior probabilities for each hypotheses\n",
" * we think that $\\mu$ can range between 4 and 6 ==> \n",
" * prior = U(4,6)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Probability density functions: Gaussian example\n",
"\n",
"3. collect data\n",
" * suppose that we draw a random bacterium that lives 4.7 years"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Probability density functions\n",
"4. compute the likelihoo of the data under each hypothesis\n",
" * Now we evaluate the likelihhod of observing x = 4.7 under all values of $\\mu$ \n",
" * this gives the **likelihood profile**\n",
"<div>\n",
"<img src=\"BT21.jpg\" width=\"400\"/>\n",
" </div>"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Probability density functions\n",
"5. Use BT to update the priors to posteriors\n",
" * reminder: we used an uniform prior\n",
" * Kruschke plot\n",
" <div>\n",
"<img src=\"BT22.jpg\" width=\"400\"/>\n",
" </div>"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Probability density functions - tractable priors\n",
"\n",
"* In the example it was easy to integrate in the denominator.\n",
"* However, sometimes it is intractable\n",
"* There are a few special cases where a particular prior, collec data distributed by some pdf and this leads to the tractable posterior. These are called **tractable priors**"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Probability density functions - examples tractable priors\n",
"\n",
"* **beta pdf prior + binomial data ==> beta pdf posterior**\n",
"* **gamma pdf prior + Poisson data ==> gamma pdf posterior**\n",
"* **normal pdf prior + normal data ==> normal pdf posterior**\n",
"* **Dirichlet pdf prior + multinormal data ==> Dirichlet pdf posterior**"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Beta-Binomial Conjugate - The White House Problem\n",
"\n",
"* now we use the BT to estimate the parameters of a pdf \n",
"\n",
"* **The problem: what is the probability that any famous person (FP) can drop by the White House without an appointment?**"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Beta-Binomial Conjugate\n",
"\n",
"* Thus is a binomial problem \n",
"\n",
"$f(y;n,p) = \\left( \\begin{array}{c} n \\\\ y \\end{array} \\right) p^y (1-p)^{(n-y)},~~ y = 1, 2, \\ldots n,~~~~\\left( \\begin{array}{c} n \\\\ y \\end{array} \\right)= \\frac{n\\!}{y\\!(n-y\\!)}$\n",
"* assume that the individual trials are independent\n",
"* **We do not know what p * the probability of success) is !**\n",
"* Our goal: to use a Bayesian inference approach to estimate the probability that that a FP can get into a White House without an invitation "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Beta-Binomial Conjugate\n",
"\n",
"1. **What are the hypotheses for p?**\n",
" * there would be the alternative hypotheses for p, ranging from 0 to 1"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Beta-Binomial Conjugate\n",
"\n",
"1. **What are the prioir densities for these hypotheses?**\n",
" * we need to assign a prior for each hypothesises value of p\n",
" * here we will use the **beta distribution** to set prior probabilities for each and every hypothesis for p\n",
" <div>\n",
"<img src=\"BT23.jpg\" width=\"400\"/>\n",
" </div>"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Beta-Binomial Conjugate\n",
"\n",
"* The beta distribution is often used as a prior distribution for a proportion\n",
"* The beta distribution is defined on the interval (0,1)\n",
"* It has two positive parameters $\\alpha$ and $\\beta$\n",
"$$f(x;\\alpha,\\beta) = \\frac{1}{B(\\alpha,\\beta)} x^{\\alpha-1}(1-x)^{\\beta-1},~~ 0 < x < 1$$\n",
"* B - normalization constant"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Beta-Binomial Conjugate\n",
"\n",
"* the bigger $\\alpha$ is relative to $\\beta$, the more the **weight** of the curve is shifted to the right and vice versa\n",
"* the mean: $\\mu = \\frac{\\alpha}{\\alpha+\\beta}$\n",
"* the variance: $\\sigma^2 = \\frac{\\alpha\\beta}{(\\alpha+\\beta)^2(\\alpha+\\beta+1)}$\n",
"<div>\n",
"<img src=\"BT24.jpg\" width=\"400\"/>\n",
" </div>"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Beta-Binomial Conjugate\n",
"\n",
"* **hyperparameter** - a parameter of a prior or a posterior distribution\n",
"* Asumme (guess) the prior beta distribution with $\\alpha_0 = 0.5$ and $\\beta_0 = 0.5$\n",
"<div>\n",
"<img src=\"BT25.jpg\" width=\"400\"/>\n",
" </div>"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Beta-Binomial Conjugate\n",
"\n",
"3. **Collect the data**\n",
" * assume that a FP makes one attempt and fails to get in\n",
" * Kruschke plot\n",
" <div>\n",
"<img src=\"BT26.jpg\" width=\"400\"/>\n",
" </div> \n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Beta-Binomial Conjugate\n",
"\n",
"4. **Determine the likelihood of the observed data, assuming each hypothesis is true**\n",
" * for each hypothesized value of p, let us compute the binomial likelihood of observing 0 successes out of 1 trial \n",
"* Now, because p is a continuous variable between 0 and 1, we have infinite number of hypotheses\n",
"* ==> we need to use the BT in order to estimate a **single** parameter, called $\\theta$\n",
"$$\\rm P(\\theta|data) = \\frac{P(data|\\theta) * P(\\theta)}\n",
"{\\int P(data|\\theta) * P(\\theta) d\\theta }$$\n",
"* Technically, the likelihood $\\rm P(data|\\theta)$ can be a pmf or a pdf"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Beta-Binomial Conjugate\n",
"\n",
"4. **Determine the likelihood of the observed data, assuming each hypothesis is true**\n",
" * In the FP problem:\n",
" $$\\rm P(p|data) = \\frac{P(data|p) * P(p)}\n",
"{\\int P(data|p) * P(p) dp}$$\n",
" * **Here is the kicker: the integration of the denominator is often tedious, and sometimes impossible**"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Beta-Binomial Conjugate\n",
"\n",
"4. **Determine the likelihood of the observed data, assuming each hypothesis is true**\n",
" * In the FP problem there exists an **analytical shortcut**:\n",
" * the posterior: beta distribution with the following hyperparameters:\n",
" \n",
" $\\alpha_{\\rm posterior} = \\alpha_0 + y~~~~~~~~~~ \\Longrightarrow~~~~ \\alpha_{\\rm posterior} = 0.5 + 0 = 0.5$\n",
" \n",
" $\\beta_{\\rm posterior} = \\beta_0 +n - y~~~~~~~~~~ \\Longrightarrow~~~~ \\beta_{\\rm posterior} = 0.5 +1 - 0 = 1.5$"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Beta-Binomial Conjugate\n",
"\n",
"5. **Use BT to compute the posterior densities for each value of p i.e. the posterior distribution**\n",
" * the prior and posterior distributions:\n",
" <div>\n",
"<img src=\"BT27.jpg\" width=\"400\"/>\n",
" </div> \n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Beta-Binomial Conjugate\n",
"\n",
"**What is the posterior in case of a \"flat prior\"?**\n",
"* Flat prior: $\\alpha_0 = 1$ and $\\beta_0=1$\n",
"* The resulting posterior: $\\alpha_{\\rm posterior} = \\alpha_0 + 1 = 1+ 0 =1$ \n",
" and $\\beta_{\\rm posterior} = \\beta_0 + n -y = 1 + 1 - 0 = 2$\n",
" <div>\n",
"<img src=\"BT28.jpg\" width=\"400\"/>\n",
" </div> \n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Beta-Binomial Conjugate\n",
"\n",
"**Is the flat prior really non-informative?**\n",
" * Here is a strange twist: the U-shaped prior that was actually used, is less informative than the \"flat prior\"\n",
" * Thus a non-informative prior for a beta distribution is not a flat one, but will will be the distribution in which $\\alpha$ and\n",
" $\\beta$ are tiny\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Beta-Binomial Conjugate\n",
"\n",
"**Whst if a FP makes a second attempt?**\n",
" * Let us assume the first set of prior and posterior parameters\n",
" * Let us suppose that a FP fails again\n",
" * Then the next posterior distribution will be the one with $\\alpha_{\\rm posterior2} = \\alpha + 1 = 0.5+ 0 =0.5$ and $\\beta_{\\rm posterior2} = \\beta + 1 -0 = 1.5 + 1 - 0 = 2.5$\n",
" <div>\n",
"<img src=\"BT29.jpg\" width=\"400\"/>\n",
" </div>\n",
" * We could get the same by \"jumping\" with two failures in one go of the analysis\n",
" \n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Beta-Binomial Conjugate\n",
"\n",
"**The conjugate shortcut**\n",
" * The above shortcut is called **the beta-binomial conjugate**\n",
" * It was introduced by Howard Raiffa and Robert Schlaifer in 1961\n",
" \n",
"table | histogram\n",
":---:| :---: \n",
"![alt](BT30.jpg) | ![alt](BT31.jpg)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Beta-Binomial Conjugate\n",
"\n",
"**How should we describe our confidence in the hypothesized values for p?**\n",
"* Let us use *a credible interval (CI)*, representing in interval in the domain of the posterior or predictive distribution\n",
"* For a 95% CU, the value of interest lies with a 95% probability in the interval i.e., given the data and the model, there is a 95% chance the true value lies in that interval\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Beta-Binomial Conjugate\n",
"\n",
"* There are three basic methods for chosing a CI:\n",
" 1. **choosing the narrowest interval**, which for a unimodal distribution wil involve choosing those values of highest pdf (sometimes called the highest posterior density interval)\n",
" 2. **choosing the interval where the probability of being below the interval is as likely as being above it** (sometimes called the equal-tailed interval)\n",
" 3. **choosing the interval for which the mean is te central point** (provided that the mean exists) "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Beta-Binomial Conjugate\n",
"\n",
"* Example: let us look for 90% CI for the problem of a FP (posterior with $\\alpha_{\\rm posterior2} =0.5$ and $\\beta_{\\rm posterior2} = 2.5$:\n",
"* we need to find the area under the curve where 5% of the distribution is in the upper tail, and 5% is in the lower tail\n",
"* The correspondig values of p are: $p_{\\rm low} = 0.00087$ and $p_{\\rm high} = 0.57$\n",
"<div>\n",
"<img src=\"BT32.jpg\" width=\"400\"/>\n",
" </div>"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### General Remarks about conjugate priors and posteriors\n",
"\n",
"* Conjugate means \"joined together\" especially in pairs\n",
"* There are cases when we can use a particular pdf as a prior distribution, collect data of a specific flavour, and then derive analytically the posterior pdf\n",
"* In these special cases, the pdf of the prior and posterior are the same probability density function, but *their parameters may differ*\n",
"* Such a prior distribution is called a **conjugate prior**\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(0, 9.238743259089906)"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXgAAAEGCAYAAABvtY4XAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjAsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+17YcXAAAgAElEQVR4nO3de3RV9Z338fc3VwEBbyB3TghBBCsSEQtipbWtlVaxrWW1HWtrO3VqR1tb9Wm7ZlztM+PM6LR9HOvT6tix1qdqHZVOrZTBerdUQQkXCUpiSAygIkFQrgJJvs8f5xwnxlx2krPPPmefz2uts3IuO3t/t8RPfvnt3++3zd0REZH4KYq6ABERCYcCXkQkphTwIiIxpYAXEYkpBbyISEyVRF1AR8cdd5wnEomoyxARyRs1NTU73H1EV5/lVMAnEglWrVoVdRkiInnDzJq7+0xdNCIiMaWAFxGJKQW8iEhMKeBFRGJKAS8iElMKeBGRmFLAi4jElAJeRCSmFPAiIjGlgBcRiSkFvIhITCngRURiSgEvIhJTCngRkZhSwIuIxJQCXkQkphTwIiIxpYAXEYkpBbyISEwp4EVEYkoBLyISUwp4kRy3f/9+rrzySk455RSuvfZaWltboy5J8oQCXiSHuTsXX3wxP/vZzygvL+e6667jyiuvjLosyRMKeJEctmzZMhYvXsw///M/s3LlSq688kp+/vOfs2LFiqhLkzxg7h7ezs2+A/w14MB64BJ3f6e77WfNmuWrVq0KrR6RfOLuVFdXs3fvXjZs2EBZWRl79+6loqKC008/nSVLlkRdouQAM6tx91ldfRZaC97MxgLfAma5+0lAMfD5sI4nEjfPP/88a9eu5eqrr6asrAyAI488kssuu4ylS5eydevWiCuUXBd2F00JMMjMSoDBwGshH08kNm6//XYGDRrE5z//3nbRl7/8Zdydu+66K6LKJF+EFvDu/irwE2Az8Drwtrv/KazjicRJW1sbixcv5oILLmD48OHv+ayyspJ58+Zx7733RlSd5Iswu2iOBhYCFcAYYIiZXdTFdpea2SozW9XS0hJWOSJ55dlnn+XNN9/kggsu6PLz888/n3Xr1qmbRnoUZhfNR4Emd29x98PA74C5nTdy99vcfZa7zxoxYkSI5Yjkj4ceeojS0lLOOeecLj9fsGABAEuXLs1mWZJnwgz4zcAHzWywmRlwNvBSiMcTiY1HH32UM844433dM2nTpk1jwoQJPPzww1muTPJJmH3wK4EHgNUkh0gWAbeFdTyRuHj77bdZu3Yt8+fP73YbM+Oss85i+fLlhDnUWfJbqKNo3P2H7j7V3U9y9y+5+8EwjycSB3/5y19ob2/nQx/6UI/bnXnmmWzfvp2GhoYsVSb5RjNZRXLMU089RWlpKaeffnqP282bNw+A5cuXZ6MsyUMKeJEc8/TTTzN79mwGDx7c43ZTp07l2GOPVcBLtxTwIjnk4MGD1NTUMHfu+wacvY+ZMXfuXAW8dEsBL5JDamtrOXz4MLNmdbm0yPvMnj2b+vp6du/eHXJlko8U8CI5pKamBoBTTz010PbV1dUArFu3LrSaJH8p4EVySE1NDUcddRSTJk0KtP3MmTMBWL16dZhlSZ5SwIvkkJqaGqqrq0nODezd6NGjGTVqlAJeuqSAF8kRhw4dYv369YG7Z9Kqq6tZs2ZNSFVJPlPAi+SI2tpaDh061OeAnzlzJi+++CIHDhwIqTLJVwp4kRyR7mbpa8DPmDGDtrY2Nm7cGEZZkscU8CI5Yv369QwZMiTwBda06dOnA/Diiy+GUZbkMQW8SI7YsGED06ZNo6iob/9bTp48mZKSEjZs2BBSZZKvFPAiOaK2tvbd1nhflJWVUVVVpRa8vI8CXiQHvPnmm7zxxhucdNJJ/fr+6dOnqwUv76OAF8kB6XDuTwsekjcAaWxs1EgaeQ8FvEgOGGjAT58+nfb2durq6jJZluQ5BbxIDqitrWXYsGGMGzeuX98/bdo0QCNp5L0U8CI5YMOGDUyfPj3wEgWdTZkyheLiYvXDy3so4EVyQDrg+6usrIzJkydrspO8hwJeJGI7d+5kx44dTJ06dUD7qaqq4uWXX85QVRIHCniRiKVDecqUKQPaT1VVFQ0NDbS3t2eiLIkBBbxIxOrr64HMBPyBAwd47bXXMlGWxIACXiRi9fX1FBcXU1FRMaD9VFVVAaibRt6lgBeJWH19PRUVFZSVlQ1oPwp46ayktw3MbApwDTCx4/bu/pEQ6xIpGPX19QPungEYP3485eXlCnh5V68BD9wP3Ar8EmgLtxyRwuLu1NfXM3/+/AHvq6ioiMrKSgW8vCtIwLe6+y2hVyJSgF577TX279+fkRY8aKikvFeQPviHzOybZjbazI5JP0KvTKQAZGoETVpVVRWbNm3SUEkBgrXgv5z6ek2H9xzo221nROR9wgj4gwcPsmXLFiZOnJiRfUr+6jXg3X1gY7dEpFv19fUMGjSIsWPHZmR/HUfSKOCl1y4aMys1s2+Z2QOpx+VmVpqN4kTirr6+nqqqqj7fpq87GiopHQX5qboFOBX4Repxauo9ERmgTA2RTBszZgzl5eU0NTVlbJ+Sv4L0wZ/m7jM6vH7czNaFVZBIoTh8+DCNjY1ceOGFGdtnUVERiURCAS9AsBZ8m5lVpl+Y2SQ0Hl5kwJqbm2ltbWXy5MkZ3W9FRQWNjY0Z3afkpyAt+GuAJ8ysETCSM1ovCbUqkQKQDuHKyspetuybiooKVq5cmdF9Sn4KMormMTOrAk4gGfAb3f1g6JWJxFyYAb9r1y7efvtthg8fntF9S37ptovGzD6S+voZ4JPAZKAS+GTqPREZgMbGRsrLyxk9enRG95telVL98NJTC/4s4HHgvC4+c+B3oVQkUiAaGxupqKjI2BDJtI4Bf8opp2R035Jfug14d/9h6uk/uPt7mgJmFmjyk5kdBfwHcBLJXwpfdfdn+1mrSKw0NjYyaVLmJ4Sn96kWvARpOizu4r0HAu7/JmCZu08FZgAvBS1MJM7cnU2bNoUS8EcffTTDhw/XSBrpvgVvZlOB6cDwTn3uw4AjetuxmQ0DPgR8BcDdDwGHBlKsSFzs3LmT3bt3hxLwkOymUQteeuqDPwH4FHAU7+2H3wN8PcC+JwEtwB1mNgOoAb7t7vv6WatIbKRb12EG/MaNG0PZt+SPnvrgHwQeNLM5/ew3LwGqgSvcfaWZ3QR8H7i240ZmdilwKcCECRP6cRiR/JONgF+2bBnujpmFcgzJfUEmOl1qZu9rsbv7V3v5vq3AVndPz7h4gGTAd97PbcBtALNmzfIA9YjkvXTAD/RG292pqKjgwIEDvPHGG4waNSqUY0juCxLwSzo8PwL4NPBab9/k7tvMbIuZneDudcDZwIv9K1MkXhobGzn++OM58sgjQ9l/x6GSCvjCFWQm63tG0ZjZb4FHA+7/CuBuMysDGtESByJAeEMk09IB39jYyJw5c0I7juS2IC34zqqAQJ3l7r4WmNWPY4jEWmNjI2eccUZo+08kEoDGwhe6IDf82GNmu9NfgYeA74Vfmkg8HT58mM2bN4fagh88eDCjRo1SwBe4IF00Q7NRiEihaG5upr29PdSAB42Fl4BdNKmJTvNILjfwZ3f/fahVicRY2EMk0yoqKnjmmWdCPYbktiBdNL8AvgGsB2qBb5jZz8MuTCSushnwW7ZsobW1NdTjSO4K0oI/CzjJ3R3AzO4kGfYi0g+NjY2UlZUxZsyYUI+TSCRoa2vj1VdfZeLEiaEeS3JTkMXG6njvqJnxwAvhlCMSf2EtE9yZRtJIT4uNPUSyz3048JKZPZd6fTqgjj2RfmpsbMz4XZy6kg74V155JfRjSW7qqYvmJ1mrQqRApJcJDnMMfNqECRMwMwV8AetpsbGnslmISCEIe5ngjsrKyhg7dqwCvoD11EWz3N3nmdkekl0z734EuLsPC706kZjJ1giatEQioYAvYN1e5XH3eamvQ919WIfHUIW7SP9EEfC6yFq4eryMb2ZFZlabrWJE4i7sZYI7q6ioYOvWrRw+fDgrx5Pc0mPAu3s7sM7MdCcOkQxoampixIgRoS0T3FkikaC9vZ2tW7dm5XiSW4JMdBoNbEgNk3z3dnvufn5oVYnEVNjLBHfWcahktv5qkNwRJOD/d+hViBSIpqYmZs+enbXjdZzs9OEPfzhrx5XcEGQq3QJ3f6rjA1gQdmEicdPa2kpzc3NWW9Ljx4+nqKhII2kKVJCA/1gX752b6UJE4m7r1q20tbVltYumtLSUcePGKeALVE/j4C8DvglUmlnHtWeGAn8JuzCRuEkPV8x2X7jGwheunvrg7wH+G/gX4Psd3t/j7jtDrUokhrI9Bj4tkUjw+OOPZ/WYkht6muj0tru/Avw9sM3dm4EK4CIzOypL9YnERlNTE8XFxYwfPz6rx00kErz66qscOnQoq8eV6AXpg18MtJnZZOB2kiF/T6hVicRQY2MjEyZMoKSkP/e677+KigrcnS1btmT1uBK9IAHf7u6twGeAf3P375AcGy8ifdDU1BTJWHQtG1y4ggT8YTP7AnAxsCT1Xml4JYnEU/pGH9mmG38UriABfwkwB/gnd28yswrgrnDLEomXffv2sX379qxfYAUYN24cxcXFasEXoF47A939ReBbHV43AdeHWZRI3KTDNYoWfElJicbCF6iexsHf5+6LzGw9710PHgB3PznUykRiJKohkmkVFRUK+ALUUwv+26mvn8pGISJxFtUkp7REIsEjjzwSybElOj3dsu/11Nfm7JUjEk+NjY0MHjyYESNGRHL8RCLBa6+9xsGDBykvL4+kBsm+bi+ymtkeM9vd3SObRYrku6amJiZNmoSZRXL8RCKBu7N58+ZIji/R6KkFPxTAzP4B2Ab8huT9WP+K5Ho0IhJQVGPg09LHfuWVV6iqqoqsDsmuIMMkz3H3X7j7Hnff7e63AJ8NuzCRuHD3rN/oozNNdipMQQK+zcz+ysyKU/do/SugLezCROJix44d7Nu3L9IW/JgxYygpKdFkpwITJOC/CCwC3kg9Ppd6T0QCyPaNtrtSUlLC+PHj1YIvMEEmOr0CLAy/FJF4Sreao+yiAa0LX4iCtOBFZADSLfh0P3hUNNmp8CjgRULW1NTEyJEjOfLIIyOtI5FI8Prrr3PgwIFI65Ds6THgUxdVF2WrGJE4inqIZFr6LwiNhS8cPQa8u7cDl2epFpFYinqIZJqGShaeIF00j5jZ1WY23syOST+CHiA1vHKNmS3pfWuReGltbWXz5s050YLvONlJCkOQe4d9NfX1bzu850DQJsm3gZeAYX2oSyQWtmzZQltbW04E/OjRoyktLVXAF5AgwyT7/ZNpZuOATwL/BHy3v/sRyVe5MkQSoLi4mAkTJmiyUwHpNeDNrBS4DPhQ6q0ngX9398MB9v9vwP9Ca9dIgcqFSU4daSx8YQnSB38LcCrwi9Tj1NR7PTKzTwHb3b2ml+0uNbNVZraqpaUlQDki+WPTpk2UlpYyfvz4qEsBNBa+0ATpgz/N3Wd0eP24ma0L8H1nAOeb2QLgCGCYmd3l7hd13MjdbwNuA5g1a9b77hwlks8aGhpIJBKUlAT5Xy18iUSCN954gwMHDjBo0KCoy5GQBV1srDL9wswmEWCxMXf/gbuPc/cE8Hng8c7hLhJ3DQ0NTJ48Oeoy3qWhkoUlSMBfAzxhZk+a2VPA48BV4ZYlkv/cnU2bNingJTJBRtE8ZmZVwAkkb/ix0d0P9uUg7v4kyYuzIgWjpaWFPXv2KOAlMt0GvJl9xN0fN7PPdPqo0sxw99+FXJtIXmtoaADIqYAfPXo0ZWVlCvgC0VML/iyS3THndfGZAwp4kR6kA76ysrKXLbOnqKiIiRMnaix8gejpnqw/NLMi4L/d/b4s1iQSC5s2baKoqCjyZYI701j4wqHFxkRC0tDQwIQJEygvL4+6lPdQwBeO0BcbEylUuTZEMq2iooKWlhb27dsXdSkSsiAB/1WSC409DdSkHqvCLEokDhoaGnKq/z0t3WXU3NwcbSESulAXGxMpVLt27WLnzp052YJPB3xTUxPTpk2LthgJVa8teDMbbGZ/b2a3pV5XpdaZEZFubNq0CcitIZJpGgtfOIJ00dwBHALmpl5vBa4LrSKRGMjFMfBpxx9/POXl5Qr4AhAk4Cvd/V+BwwDufoDkjFYR6Ua6BZ8L68B3lh66qYCPvyABf8jMBpGc3ERq4bE+LVUgUmgaGhoYM2YMgwcPjrqULiUSCU12KgBBAv5HwDJgvJndDTwGfC/MokTyXa4OkUxTC74w9Brw7v4n4DPAV4DfArPc/YmQ6xLJa/X19Tkf8G+++SZ79uyJuhQJUZBRNI+5+5vu/kd3X+LuO8zssWwUJ5KP3nrrLbZv387UqVOjLqVb6VsIaix8vHUb8GZ2RGrG6nFmdnSHWawJYEy2ChTJN3V1dQCccMIJEVfSvY5j4SW+epro9DfAlSTDvIb/GTmzG/h5yHWJ5K18Cnj1w8dbT6tJ3gTcZGZXuPvNWaxJJK9t3LiRkpKSnBwimTZy5EgGDRqkgI+5IKNotpnZUIDUjNbfmVl1yHWJ5K26ujoqKyspLS2NupRumZlG0hSAIAF/rbvvMbN5wDnAncAt4ZYlkr/q6upyunsmLZFI0NjYGHUZEqIgAd+W+vpJ4BZ3fxAoC68kkfzV1tZGQ0NDXgR8VVUVL7/8Mu4edSkSkiAB/6qZ/TuwCFhqZuUBv0+k4DQ3N3Pw4MG8CPgpU6awb98+tm3bFnUpEpIgQb0IeBj4hLu/BRwDXBNqVSJ5auPGjUBuj6BJmzJlCpCclCXxFGQm635gE3COmV0OjEzNbhWRTtJDJHN5klOaAj7+gsxk/TZwNzAy9bjLzK4IuzCRfFRXV8cxxxzDcccdF3UpvRo/fjzl5eUK+Bjr9Y5OwNeA0919H4CZ3QA8C2hsvEgn+TKCBpLLBk+ePFkBH2NB+uCN/xlJQ+q51oMX6UI+BTwku2kU8PEVpAV/B7DSzP4r9foC4PbwShLJT7t27eL111/nxBNPjLqUwKZMmcKSJUtobW2lpCRIHEg+CXKR9f8AlwA7gV3AJe7+b2EXJpJvNmzYAMBJJ50UcSXBTZkyhcOHD7N58+aoS5EQdPsr28yOAL4BTAbWA79w99ZsFSaSb/I14CE5kiaX186R/umpBX8nMItkuJ8L/CQrFYnkqdraWoYOHcr48eOjLiUwDZWMt5463aa5+wcAzOx24LnslCSSn2pra5k+fTpm+TMGYcSIEQwfPlwBH1M9teAPp5+oa0akdxs2bMir7hlIriqpkTTx1VMLfoaZ7U49N2BQ6rUB7u7DQq9OJE9s376dlpaWvAt4SHbTLF++POoyJATdtuDdvdjdh6UeQ929pMNzhbtIB7W1tUB+XWBNO+GEE2hubmbfvn1RlyIZplUhRTIgPYJm+vTpEVfSd+maX3rppYgrkUxTwItkQG1tLcceeyzHH3981KX0WTrg07+kJD4U8CIZsH79+rwbQZNWWVlJWVmZAj6GFPAiA9TW1sYLL7zAKaecEnUp/VJSUsLUqVMV8DGkgBcZoIaGBvbt28fMmTOjLqXfpk+froCPodAC3szGm9kTZvaSmW1IrSsvEjurV68GyPuAb25uZu/evVGXIhkUZgu+FbjK3U8EPgj8rZlNC/F4IpFYs2YNZWVlTJuWvz/e6QutL774YsSVSCaFFvDu/rq7r0493wO8BIwN63giUVm9ejUf+MAHKC0tjbqUftNImnjKSh+8mSWAmcDKLj671MxWmdmqlpaWbJQjkjHuzpo1a/K6ewZg0qRJHHHEEQr4mAk94M3sSGAxcKW77+78ubvf5u6z3H3WiBEjwi5HJKO2bNnCzp078z7gi4uLmTp16rszciUeQg14MyslGe53u/vvwjyWSBTicIE17eSTT2bdunVRlyEZFOYoGiN5a7+XUneFEomdNWvWUFRUxMknnxx1KQNWXV3Ntm3beP3116MuRTIkzBb8GcCXgI+Y2drUY0GIxxPJulWrVjF16lSGDBkSdSkDlv4rZM2aNRFXIpkS5iia5e5u7n6yu5+SeiwN63gi2eburFixgg9+8INRl5IR6Zm4Cvj40ExWkX5qaGhg586dzJkzJ+pSMmLYsGFMnjz53esKkv8U8CL9tGLFCoDYtOAh2U2jFnx8KOBF+mnFihUMHTqUE088MepSMqa6upqmpiZ27doVdSmSAQp4kX5asWIFs2fPpri4OOpSMiZ9oXXt2rURVyKZoIAX6Yf9+/fzwgsvxKp7BjSSJm4U8CL98Nxzz9Ha2hqbC6xpI0eOZNy4cTz//PNRlyIZoIAX6Ycnn3ySoqIi5s2bF3UpGTdnzhyeffbZqMuQDFDAi/TDk08+SXV1NcOHD4+6lIybM2cOzc3NmtEaAwp4kT565513WLFiBWeddVbUpYQi3e2kVnz+U8CL9NGKFSs4ePAg8+fPj7qUUMycOZPy8nKeeeaZqEuRAVLAi/TRU089hZnFsv8doLy8nFNPPVUt+BhQwIv00SOPPEJ1dTVHHXVU1KWEZu7cuaxatYqDBw9GXYoMgAJepA927drFs88+y7nnnht1KaGaN28ehw4dYuXK992ETfKIAl6kDx555BHa29tjH/BnnXUWRUVFPPbYY1GXIgOggBfpg6VLl3LMMcdw+umnR11KqI466ihOPfVUBXyeU8CLBNTe3s6yZcv4+Mc/Hqv1Z7pz9tlns3LlSvbu3Rt1KdJPCniRgGpqanjjjTdi3z2TdvbZZ9Pa2srTTz8ddSnSTwp4kYDuv/9+SktLOe+886IuJSvOOOMMysvLefTRR6MuRfpJAS8SgLtz33338dGPfpSjjz466nKyYtCgQcyfP5+HHnoId4+6HOkHBbxIAKtWraK5uZlFixZFXUpWLVy4kIaGBl566aWoS5F+UMCLBHDfffdRWlrKwoULoy4lq84//3wAHnzwwYgrkf5QwIv04vDhw9x1112cc845BdM9kzZ27Fhmz57N73//+6hLkX5QwIv04o9//CPbtm3j61//etSlRGLhwoU899xzbN26NepSpI8U8CK9+OUvf8mYMWNYsGBB1KVEIn3d4e677464EukrBbxIDzZv3syyZcu45JJLKCkpibqcSEyePJl58+Zxxx13aDRNnlHAi/TgxhtvxMwKtnsm7Stf+Qp1dXVafCzPKOBFurFjxw5uu+02vvjFLzJx4sSoy4nUokWLGDx4ML/61a+iLkX6QAEv0o2bb76Z/fv3873vfS/qUiI3dOhQvvCFL/Cb3/yGlpaWqMuRgBTwIl3Ytm0bN954I5/+9KeZPn161OXkhO9+97u888473HLLLVGXIgEp4EW6cO211/LOO+9www03RF1Kzpg2bRoLFizg5ptvZvfu3VGXIwEo4EU6ef7557n99tu54oorqKqqirqcnPKjH/2IHTt28NOf/jTqUiQAy6VhT7NmzfJVq1ZFXYYUsP3791NdXc3evXupra2N9X1X+2vRokUsXbqUjRs3Mm7cuKjLKXhmVuPus7r6TC14kQ6uuuoq6urquPPOOxXu3bj++utpb2/nsssu07j4HKeAF0m5+eabufXWW7nmmms4++yzoy4nZ02aNInrrruOJUuWcMcdd0RdjvRAXTQiwD333MOXvvQlzjvvPBYvXlwQt+QbiLa2Ns455xyWL1/On//8Z0477bSoSypY6qIR6Ya7c9NNN3HRRRdx5plncvfddyvcAyguLubee+9l1KhRnHvuudTW1kZdknRBAS8Fq6Wlhc997nNceeWVLFy4kGXLljFkyJCoy8obxx13HI899hjl5eWceeaZLFu2LOqSpBMFvBScXbt2cf3111NVVcWDDz7Ij3/8YxYvXswRRxwRdWl5p7KykuXLlzNhwgQWLFjA5Zdfzq5du6IuS1JCDXgz+4SZ1ZlZg5l9P8xjifTk7bff5oEHHuDiiy9m3Lhx/OAHP2Du3Lm88MILXH311RQVqa3TXxUVFTzzzDN885vf5JZbbiGRSHDVVVdRU1OjUTYRC+0iq5kVA/XAx4CtwPPAF9z9xe6+RxdZe9f536urf78g72Vzm7CPf+DAAfbu3cuePXvYs2cPb731Flu3bmXLli00Njaydu1aXn75ZQCOPvpoPvvZz3L55ZczY8aM9+1fBmbdunXccMMN3H///bS2tjJ69GhOO+00TjnlFBKJBGPHjmXUqFEceeSRDBkyhMGDBzNo0CCKi4sxs6jLz0s9XWQNM+DnAD9y93NSr38A4O7/0t339DfgR44cyf79+999HZeAk4EpLS1lwoQJzJgxg+rqas4880zmzp1bsOu6Z1NLSwtLly7lT3/6E6tXr6auri7Qz3dRURFmRlFR0bsPM4t9+B9//PFs2rSpX98bVcBfCHzC3f869fpLwOnufnmn7S4FLk29PAGo6+chjwN29PN785XOOf4K7XxB59xXE919RFcfhNmU6epX7vt+m7j7bcBtAz6Y2arufovFlc45/grtfEHnnElhXlnaCozv8Hoc8FqIxxMRkQ7CDPjngSozqzCzMuDzwB9CPJ6IiHQQWheNu7ea2eXAw0Ax8Ct33xDW8chAN08e0jnHX6GdL+icMyan1qIREZHM0ewOEZGYUsCLiMRU3gV8b8sfmFm5mf1n6vOVZpbIfpWZE+B8v2tmL5rZC2b2mJlNjKLOTAq6xIWZXWhmbmZ5P6QuyDmb2aLUv/UGM7sn2zVmWoCf7Qlm9oSZrUn9fC+Ios5MMbNfmdl2M+ty6U1L+lnqv8cLZlY94IO6e948SF6s3QRMAsqAdcC0Ttt8E7g19fzzwH9GXXfI5/thYHDq+WX5fL5Bzzm13VDgaWAFMCvqurPw71wFrAGOTr0eGXXdWTjn24DLUs+nAa9EXfcAz/lDQDVQ283nC4D/JjmH6IPAyoEeM99a8LOBBndvdPdDwL3Awk7bLATuTD1/ADjb8neec6/n6+5PuHt6nYYVJOcb5LMg/8YA/wj8K/BONosLSZBz/jrwc3ffBeDu27NcY6YFOWcHhqWeDyfP59G4+9PAzh42WQj8P09aARxlZqMHcsx8C/ixwFsRlH0AAASoSURBVJYOr7em3utyG3dvBd4Gjs1KdZkX5Hw7+hrJFkA+6/WczWwmMN7dl2SzsBAF+XeeAkwxs7+Y2Qoz+0TWqgtHkHP+EXCRmW0FlgJXZKe0yPT1//de5duqS0GWPwi0REKeCHwuZnYRMAs4K9SKwtfjOZtZEXAj8JVsFZQFQf6dS0h208wn+Vfan83sJHd/K+TawhLknL8A/Nrdf5pavPA3qXNuD7+8SGQ8u/KtBR9k+YN3tzGzEpJ/2vX0Z1EuC7Tcg5l9FPg74Hx3P5il2sLS2zkPBU4CnjSzV0j2Vf4hzy+0Bv25ftDdD7t7E8lF+aqyVF8Ygpzz14D7ANz9WeAIkotyxVXGl3fJt4APsvzBH4Avp55fCDzuqSsYeajX8011V/w7yXDP935Z6OWc3f1tdz/O3RPuniB53eF8d8/nGwkE+bn+PckL6pjZcSS7bBqzWmVmBTnnzcDZAGZ2IsmAb8lqldn1B+Di1GiaDwJvu/vrA9lhXnXReDfLH5jZPwCr3P0PwO0k/5RrINly/3x0FQ9MwPP9MXAkcH/qWvJmdz8/sqIHKOA5x0rAc34Y+LiZvQi0Ade4+5vRVT0wAc/5KuCXZvYdkl0VX8njxhpm9luSXWzHpa4r/BAoBXD3W0leZ1gANAD7gUsGfMw8/u8lIiI9yLcuGhERCUgBLyISUwp4EZGYUsCLiMSUAl5EJKYU8CJZYmYJM/ti1HVI4VDAi3RgZsUh7j4B9CngQ65HYk4BL3nFzP4utYb4o2b2WzO7uottfm1mt5rZn82s3sw+lXo/kXpvdeoxN/X+/NS64/cA61Pv/d7MalJrr1/aYd97zeyG1GePmtlsM3vSzBrN7PzUNsVm9mMzez61rvffpL79euBMM1trZt/pbruu6hHpj7yaySqFzcxOJTkzeSbJn93VQE03mydILrxWCTxhZpOB7cDH3P0dM6sCfktygTZILl97UmqdF4CvuvtOMxsEPG9mi1MzR4cAT7r798zsv4DrgI+RXK/8TpLTzb9Gcpr5aWZWDvzFzP4EfB+42t3Tv3Au7Wa7ruoR6TMFvOSTM4H/Sq9/b2Y9LVtwX2rVwZfNrBGYCjQB/9fMTiE53X9Kh+2f6xSm3zKzT6eejye5sNebwCFgWer99cBBdz9sZutJ/lIB+DhwspldmHo9PPX9hzrV2NN2nesR6TMFvOSboGtrdN7Oge8AbwAzSHZPdrxZyL70EzObD3wUmOPu+83sSZILXQEc7rAeSjtwEMDd21Orl0Jy2dcr3P3hjgWk9vuet3rYbh8iA6Q+eMknTwOfNrNBZjYUOK+HbT9nZkVmVknytnB1JFvIr6da9l8iuchVV4YDu1LhPpXkksR98TBwmZmVApjZFDMbAuwhudxxb9uJZIRa8JI33H21mf0nsBZoBv7cw+Z1wFPA8cA3Uv3uvwAWm9nngCfovpW8DPiGmb2Q2s+KPpb6HyS7a1ZbconPFuAC4AWg1czWAb8GbupmO5GM0GqSkrfM7EfAXnf/Saf3fw0scfcHoqhLJFeoi0ZEJKbUghcRiSm14EVEYkoBLyISUwp4EZGYUsCLiMSUAl5EJKb+P+1WhPMvtX3mAAAAAElFTkSuQmCC\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"import numpy as np\n",
"import scipy.stats as st\n",
"import matplotlib.pyplot as plt\n",
"%matplotlib inline\n",
"\n",
"def posterior(n, h, q):\n",
" return (n + 1) * st.binom(n, q).pmf(h)\n",
"\n",
"n = 100\n",
"h = 61\n",
"q = np.linspace(0., 1., 1000)\n",
"d = posterior(n, h, q)\n",
"\n",
"fig, ax = plt.subplots(1, 1)\n",
"ax.plot(q, d, '-k')\n",
"ax.set_xlabel('q parameter')\n",
"ax.set_ylabel('Posterior distribution')\n",
"ax.set_ylim(0, d.max() + 1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"celltoolbar": "Slideshow",
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 1
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment