Skip to content

Instantly share code, notes, and snippets.

@AllenDowney
Created April 11, 2019 14:27
Show Gist options
  • Save AllenDowney/000141a3e9c2fc81b758ba41991d09e5 to your computer and use it in GitHub Desktop.
Save AllenDowney/000141a3e9c2fc81b758ba41991d09e5 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Implementing CDFs\n",
"\n",
"Copyright 2019 Allen Downey\n",
"\n",
"BSD 3-clause license: https://opensource.org/licenses/BSD-3-Clause"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline\n",
"\n",
"import numpy as np\n",
"import pandas as pd\n",
"\n",
"import seaborn as sns\n",
"sns.set_style('white')\n",
"\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import inspect\n",
"\n",
"def psource(obj):\n",
" \"\"\"Prints the source code for a given object.\n",
"\n",
" obj: function or method object\n",
" \"\"\"\n",
" print(inspect.getsource(obj))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Constructor\n",
"\n",
"For comments or questions about this section, see [this issue](https://github.com/AllenDowney/EmpyricalDistributions/issues/11).\n",
"\n",
"The `Cdf` class inherits from `pd.Series`. The `__init__` method is essentially unchanged, but it includes a workaround for what I think is bad behavior."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" def __init__(self, *args, **kwargs):\n",
" \"\"\"Initialize a Cdf.\n",
"\n",
" Note: this cleans up a weird Series behavior, which is\n",
" that Series() and Series([]) yield different results.\n",
" See: https://github.com/pandas-dev/pandas/issues/16737\n",
" \"\"\"\n",
" if args:\n",
" super().__init__(*args, **kwargs)\n",
" else:\n",
" underride(kwargs, dtype=np.float64)\n",
" super().__init__([], **kwargs)\n",
"\n"
]
}
],
"source": [
"from empyrical_dist import Cdf\n",
"\n",
"psource(Cdf.__init__)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can create an empty `Cdf` and then add elements.\n",
"\n",
"Here's a `Cdf` that representat a four-sided die."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"d4 = Cdf()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"d4[1] = 1\n",
"d4[2] = 2\n",
"d4[3] = 3\n",
"d4[4] = 4"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>probs</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>4</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"1 1\n",
"2 2\n",
"3 3\n",
"4 4\n",
"dtype: int64"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d4"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In a normalized `Cdf`, the last probability is 1.\n",
"\n",
"`normalize` makes that true. The return value is the total probability before normalizing."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" def normalize(self):\n",
" \"\"\"Make the probabilities add up to 1 (modifies self).\n",
"\n",
" :return: normalizing constant\n",
" \"\"\"\n",
" total = self.ps[-1]\n",
" self /= total\n",
" return total\n",
"\n"
]
}
],
"source": [
"psource(Cdf.normalize)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"4"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d4.normalize()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now the Cdf is normalized."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>probs</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0.50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>0.75</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1.00</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"1 0.25\n",
"2 0.50\n",
"3 0.75\n",
"4 1.00\n",
"dtype: float64"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d4"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Properties\n",
"\n",
"For comments or questions about this section, see [this issue](https://github.com/AllenDowney/EmpyricalDistributions/issues/2).\n",
"\n",
"In a `Cdf` the index contains the quantities (`qs`) and the values contain the probabilities (`ps`).\n",
"\n",
"These attributes are available as properties that return arrays (same semantics as the Pandas `values` property)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([1, 2, 3, 4])"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d4.qs"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([0.25, 0.5 , 0.75, 1. ])"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d4.ps"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Sharing\n",
"\n",
"For comments or questions about this section, see [this issue](https://github.com/AllenDowney/EmpyricalDistributions/issues/12).\n",
"\n",
"Because `Cdf` is a `Series` you can initialize it with any type `Series.__init__` can handle.\n",
"\n",
"Here's an example with a dictionary."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>probs</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>a</th>\n",
" <td>0.333333</td>\n",
" </tr>\n",
" <tr>\n",
" <th>b</th>\n",
" <td>0.666667</td>\n",
" </tr>\n",
" <tr>\n",
" <th>c</th>\n",
" <td>1.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"a 0.333333\n",
"b 0.666667\n",
"c 1.000000\n",
"dtype: float64"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d = dict(a=1, b=2, c=3)\n",
"cdf = Cdf(d)\n",
"cdf.normalize()\n",
"cdf"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here's an example with two lists."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>probs</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0.50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>0.75</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1.00</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"1 0.25\n",
"2 0.50\n",
"3 0.75\n",
"4 1.00\n",
"dtype: float64"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"qs = [1,2,3,4]\n",
"ps = [0.25, 0.5, 0.75, 1.0]\n",
"d4 = Cdf(ps, index=qs)\n",
"d4"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can copy a `Cdf` like this."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>probs</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0.50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>0.75</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1.00</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"1 0.25\n",
"2 0.50\n",
"3 0.75\n",
"4 1.00\n",
"dtype: float64"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d4_copy = Cdf(d4)\n",
"d4_copy"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"However, you have to be careful about sharing. In this example, the copies share the arrays:"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d4.index is d4_copy.index"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d4.ps is d4_copy.ps"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can avoid sharing with `copy=True`"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>probs</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0.50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>0.75</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1.00</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"1 0.25\n",
"2 0.50\n",
"3 0.75\n",
"4 1.00\n",
"dtype: float64"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d4_copy = Cdf(d4, copy=True)\n",
"d4_copy"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d4.index is d4_copy.index"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d4.ps is d4_copy.ps"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Or by calling `copy` explicitly."
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>probs</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0.50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>0.75</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1.00</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"1 0.25\n",
"2 0.50\n",
"3 0.75\n",
"4 1.00\n",
"dtype: float64"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d4_copy = d4.copy()\n",
"d4_copy"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d4.index is d4_copy.index"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d4.ps is d4_copy.ps"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Displaying CDFs\n",
"\n",
"For comments or questions about this section, see [this issue](https://github.com/AllenDowney/EmpyricalDistributions/issues/13).\n",
"\n",
"`Cdf` provides `_repr_html_`, so it looks good when displayed in a notebook."
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" def _repr_html_(self):\n",
" \"\"\"Returns an HTML representation of the series.\n",
"\n",
" Mostly used for Jupyter notebooks.\n",
" \"\"\"\n",
" df = pd.DataFrame(dict(probs=self))\n",
" return df._repr_html_()\n",
"\n"
]
}
],
"source": [
"psource(Cdf._repr_html_)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`Cdf` provides `plot`, which plots the Cdf as a line."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" def plot(self, **options):\n",
" \"\"\"Plot the Cdf as a line.\n",
"\n",
" :param options: passed to plt.plot\n",
" :return:\n",
" \"\"\"\n",
" underride(options, label=self.name)\n",
" plt.plot(self.qs, self.ps, **options)\n",
"\n"
]
}
],
"source": [
"psource(Cdf.plot)"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [],
"source": [
"def decorate_dice(title):\n",
" \"\"\"Labels the axes.\n",
" \n",
" title: string\n",
" \"\"\"\n",
" plt.xlabel('Outcome')\n",
" plt.ylabel('CDF')\n",
" plt.title(title)"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"d4.plot()\n",
"decorate_dice('One die')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`Cdf` also provides `step`, which plots the Cdf as a step function."
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" def step(self, **options):\n",
" \"\"\"Plot the Cdf as a step function.\n",
"\n",
" :param options: passed to plt.step\n",
" :return:\n",
" \"\"\"\n",
" underride(options, label=self.name, where='post')\n",
" plt.step(self.qs, self.ps, **options)\n",
"\n"
]
}
],
"source": [
"psource(Cdf.step)"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAESCAYAAADwnNLKAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAE95JREFUeJzt3Xu0lXWdx/H3wZhojJMpTTJoZSv9jkwFKQUYOlphyWQw2U3LjIlJSqcLU1662mV1cWIqa1Fp45SNl0wlsCa0rMYEMS+jIci3TjPeSJtsKWgoCof5Y+/TPJ0OZ5+DPGez+b1fa7ncz/N79v59H39LPvyea9fWrVuRJJVrVLsLkCS1l0EgSYUzCCSpcAaBJBXOIJCkwhkEklS4J7S7AGmkRcR84O3AaGArcBPwgcy8s6b+pgCXZOazmn3vkZmfrqMvaXsYBCpKRHwWmAS8MjPviohRwJuAayNiambeXWf/mfmVOn9f2h4GgYoREfsA84F9M/N+gMzsBc6LiIOB04GTIuJ24OvAS4FnAOdl5oeav3E08EHgz4CNwHsz89oB+no78B5gPbCqsv4MYFxmnhwRE4AvNfsYDVyUmZ/c4TsuteA5ApVkKnBbXwj080NgRmX5yZl5KHAI8N6I2C8i9gc+CczKzBcAbwMui4jdqz8UEZOBM4DDMvOFwKPbqOebwLmZeTDwIuBlEfG67d89afsYBCrN6G2sfyKN8wV9lgBk5jrgf4E9gZnAeOCqiLgZOB/oBZ7T77deClyZmfc2l8/u31kzPP4G+Hjzt1bSmBlM3o59kh4XDw2pJCuB/SNi78of0n2OAFZUlh+ufN4KdAG7AVdl5uv7GiJiX+DXA/TVVfm8eYD23ZrbHJKZG5u/NQ54ZIj7Iu0wzghUjObf7s8CLmwenwcgIuYCxwCfafETVwFHRsRfNb83C/g58KR+213Z3G6f5vJbBqhlA41gWtD8rT2A5cDs4e2V9PgZBCpKZp4O/DuwJCJujYhfAi8DpmfmHS2+u4bGeYGLIuIW4OPAqzLzoX7brQJOoXEI6QZgzDZ+8jhgWkSsAq4DLszM8x/H7knbpcvHUEtS2ZwRSFLhDAJJKpxBIEmFMwgkqXAdcR/B1KlTt06YMKH1hpKkP1i9evV9mfm0Vtt1RBBMmDCByy67rN1lSFJHiYhBL4nu46EhSSqcQSBJhTMIJKlwBoEkFc4gkKTCGQSSVLjagiAipkbETwZYf3REXB8R10bEP9TVvyRpaGoJgog4Bfga/R6/GxGjgc8BR9J4O9PbImLvOmqQpE536Y13c+mNd9feT10zgl8Brx5g/YFAT2ben5mPAtcAh9ZUgyR1tItvuIuLb7ir9n5qCYLMvBR4bICmbmB9ZflB4Cl11CBJGpqRPlm8ARhbWR4LPDDCNUiSKkb6WUO30Xh5+J7AQ8BhwGdHuAZJUsWIBEFEHAc8OTPPjogFwBU0ZiPnNl8oLklqk9qCIDNvB6Y1P19QWX85cHld/UqShscbyiSpcAaBJBXOIJCkwhkEklQ4g0CSCmcQSFLhDAJJKpxBIEmFMwgkqXAGgSQVziCQpMIZBJJUOINAkgpnEEhS4QwCSSqcQSBJhTMIJKlwBoEkFa6WV1VGxChgETAJ2ATMy8yeSvupwLHABuDMzPxuHXVIklqra0YwBxiTmdOB04CFfQ0R8TzgOBrvMz4S+FhE/HlNdUiSWqgrCGYAywAycyUwpdJ2IPCTzHwkMx8Bfgk8v6Y6JEkt1BUE3cD6yvKWiOg7DLUKOCwixkbEXsAhwO411SFJaqGuINgAjK32k5mbATLzNuBLwPdpHDK6DrivpjokSS3UFQTLgVkAETGNxiyA5vLTgHGZOQN4F7AvcGtNdUiSWqjlqiFgMTAzIlYAXcDciFgA9ACXA8+OiOuBR4H3ZeaWmuqQJLVQSxBkZi8wv9/qtZXPJ9bRryRp+LyhTJIKZxBIUuEMAkkqnEEgSYUzCCSpcAaBJBXOIJCkwhkEklQ4g0CSCmcQSFLhDAJJKpxBIEmFMwgkqXAGgSQVziCQpMIZBJJUOINAkgpnEEhS4Wp5VWVEjAIWAZOATcC8zOyptL8XOBboBT6ZmYvrqEOS1FpdM4I5wJjMnA6cBizsa4iIPYB3AtOBI4HP11SDJGkI6gqCGcAygMxcCUyptP0euAPYvflPb001SJKGoK4g6AbWV5a3RET1MNRdwBrgJuCsmmqQJA1BXUGwARhb7SczNzc/HwWMB/YDngHMiYgX1VSHJKmFuoJgOTALICKmAasqbfcDDwObMvMR4AFgj5rqkCS1UMtVQ8BiYGZErAC6gLkRsQDoycylEfEyYGVE9ALXAD+oqQ5JUgu1BEFm9gLz+61eW2n/CPCROvqWJA2PN5RJUuEMAkkqnEEgSYUzCCSpcAaBJBXOIJCkwhkEklS4um4ok9RBLrjuTpbcvK7dZaifNfdsYOL47tr7cUYgiSU3r2PNPRvaXYb6mTi+m9mTJ9TejzMCSUDjD51vnTi93WWoDZwRSFLhDAJJKpxBIEmFMwgkqXAGgSQVziCQpMIZBJJUOINAkgpnEEhS4Wq5szgiRgGLgEnAJmBeZvY02yYDn69sPg2Yk5nL6qhFkjS4uh4xMQcYk5nTI2IasBCYDZCZNwOHA0TEa4FfGwKS1D51HRqaASwDyMyVwJT+G0TE7sBHgXfWVIMkaQjqCoJuYH1leUtE9J99vBX4dmbeV1MNkqQhqOvQ0AZgbGV5VGZu7rfNG4HX1NS/JGmI6poRLAdmATTPEayqNkbEU4AnZuZdNfUvSRqiumYEi4GZEbEC6ALmRsQCoCczlwIHALfX1LckaRhqCYLM7AXm91u9ttJ+PY0riyRJbeYNZZJUOINAkgpnEEhS4QwCSSqcQSBJhTMIJKlwBoEkFW7QIIiISyqfj6q/HEnSSGs1I9ir8vl9dRYiSWqP4Rwa6qqtCklS27R6xERXRIymERh9n7sAMvPRuouTJNWvVRA8E0j+fzbwi+a/twLPrqsoSdLIGTQIMnO/kSpEktQeLZ8+GhF/C7weGAfcDVyYmT+uuzBJ0sgYNAgi4iTgKOALwG9oHCr6QETsn5lnj0B9kqSatZoRvBE4NDO3NJd/HhFXAlcCBoEk7QJaXT76aCUEAMjMTUD/9w9LkjpUqyDo3cZ67ymQpF1Eq0NDBzffO1zVBRw42JciYhSwCJgEbALmZWZPpf0o4CPNxZuAkzJz63AKlyTtGK1mBJOAbwAnAMcCpwLnAZNbfG8OMCYzpwOnAQv7GiJiLPDPwCszcxqNl9iP257iJUmPX6sgOAGYCazLzDuAu5rLx7f43gxgGUBmrgSmVNoOAVYBCyPip8BvMvO321G7JGkHaBUEs4DXZuZGgMy8ncY9Ba9q8b1uYH1leUtE9B2GGgccQWN2cRTw7og4YJh1S5J2kFZB8Pv+x+4z8zHgwRbf2wCMrfaTmX1XGv0OuD4z783Mh4CraX2oSZJUk1ZBsDEi/uiZQs3lVid2l9OYTRAR02gcCupzI/DciBjXnCVMA9YMq2pJ0g7T6qqhU4HvRMRVwH8DzwBeTuPcwWAWAzObVxx1AXMjYgHQk5lLI+J04Irmthdn5q3bvQeSpMel1UPnVkfEocBs4C9pXOr5scwc9NBQZvYC8/utXltpvwi4aLsqliTtUC0fOpeZ62lcMipJ2gX58npJKpxBIEmFMwgkqXAGgSQVziCQpMIZBJJUOINAkgpnEEhS4QwCSSqcQSBJhTMIJKlwBoEkFc4gkKTCGQSSVDiDQJIKZxBIUuEMAkkqXMs3lG2PiBgFLAImAZuAeZnZU2k/C3gx0PfKy9nNN6FJkkZYLUEAzAHGZOb0iJgGLKTx3uM+BwEvz8z7aupfkjREdQXBDGAZQGaujIgpfQ3N2cL+wNkR8XTgXzPz3Jrq0E7oguvuZMnN69pdhirW3LOBieO7212G2qSucwTdQPVQz5aI6Aud3YEvAm8CXgG8IyKeX1Md2gktuXkda+7Z0O4yVDFxfDezJ09odxlqk7pmBBuAsZXlUZm5ufl5I/CFzNwIEBE/onEu4ec11aKd0MTx3XzrxOntLkMS9c0IlgOzAJrnCFZV2g4AromI3SJiNI3DSDfVVIckqYW6ZgSLgZkRsQLoAuZGxAKgJzOXRsT5wErgMeC8zFxdUx2SpBZqCYLM7AXm91u9ttJ+JnBmHX1LkobHG8okqXAGgSQVziCQpMIZBJJUOINAkgpnEEhS4QwCSSqcQSBJhTMIJKlwBoEkFc4gkKTCGQSSVDiDQJIKZxBIUuEMAkkqnEEgSYUzCCSpcAaBJBXOIJCkwtXyzuKIGAUsAiYBm4B5mdkzwDbfA5Zk5lfqqEOS1FpdM4I5wJjMnA6cBiwcYJtPAHvW1L8kaYjqCoIZwDKAzFwJTKk2RsRrgF7g+zX1L0kaorqCoBtYX1neEhFPAIiI5wLHAR+uqW9J0jDUco4A2ACMrSyPyszNzc9vBiYAPwKeBTwaEbdn5rKaapEkDaKuIFgOHA1cHBHTgFV9DZl5St/niDgDuNcQkKT2qSsIFgMzI2IF0AXMjYgFQE9mLq2pT0nSdqglCDKzF5jfb/XaAbY7o47+JUlD5w1lklQ4g0CSCmcQSFLhDAJJKpxBIEmFMwgkqXAGgSQVziCQpMIZBJJUOINAkgpnEEhS4QwCSSqcQSBJhTMIJKlwBoEkFc4gkKTCGQSSVDiDQJIKV8urKiNiFLAImARsAuZlZk+l/STgLcBW4GOZ+d066pAktVbXjGAOMCYzpwOnAQv7GiJiHPAO4BDgpcCXI6KrpjokSS3UFQQzgGUAmbkSmNLXkJn3AZMy8zFgb+CBzNxaUx2SpBbqCoJuYH1leUtE/OEwVGZujoiTgZXAJTXVIEkagrqCYAMwttpPZm6ubpCZXwLGA4dFxBE11SFJaqGWk8XAcuBo4OKImAas6muIiAA+BRwDPEbjZHJvTXVIklqoKwgWAzMjYgXQBcyNiAVAT2YujYhbgGtpXDX0/cz8z5rqkCS1UEsQZGYvML/f6rWV9o8CH62jb0nS8HhDmSQVziCQpMIZBJJUOINAkgpnEEhS4QwCSSqcQSBJhavrhrKdwqU33s3FN9zV7jLUz5p7NjBxfHe7y5DU5IxAI27i+G5mT57Q7jIkNe3SM4JjDt6HYw7ep91lSNJOzRmBJBXOIJCkwhkEklQ4g0CSCmcQSFLhDAJJKpxBIEmFMwgkqXAdcUPZ6tWr74uIO9pdhyR1mGcOZaOurVu31l2IJGkn5qEhSSqcQSBJhTMIJKlwBoEkFc4gkKTCGQSSVLiOuI9gqCJiKvCZzDy83/qjgQ8Dm4FzM/OcNpQ3LIPsywLgrcBvm6tOzMwc4fKGJCJGA+cCzwKeCHwiM5dW2jtmXIawL500LrsB5wABbAHmZuavKu0dMS5D2I+OGZM+EfEXwI3AzMxcW1lf65jsMkEQEacAxwO/77d+NPA54IXNtuURcXlm3jvyVQ7Ntval6SDgzZl548hWtV3eBPwuM4+PiL2A/wKWQkeOyzb3pamTxuVogMx8cUQcDvwLMBs6bly2uR9NnTQmff/tvwo8PMD6WsdkVzo09Cvg1QOsPxDoycz7M/NR4Brg0BGtbPi2tS8ABwOnR8Q1EXH6CNa0Pb4NfKiyvLnyudPGZbB9gQ4al8z8DvC25uIzgd9UmjtmXFrsB3TQmDR9FvgK8Ot+62sfk10mCDLzUuCxAZq6gfWV5QeBp4xIUdtpkH0BuAiYD7wEmBERrxyxwoYpMx/KzAcjYixwCfDBSnNHjUuLfYEOGheAzNwcEd8Avkhjf/p02rhsaz+gg8YkIt4C/DYzrxigufYx2WWCYBAbgLGV5bHAA22q5XGJiC7g85l5X/NvBt8DXtDmsgYVEfsCPwa+mZkXVJo6bly2tS+dOC4AmXkCcABwTkTs3lzdceMy0H504Jj8PTAzIn4CTAbOi4i9m221j8kuc45gELcB+0fEnsBDwGE0pmCdqBu4NSIOpHGs8CU0TmDulCLi6cCVwMmZeVW/5o4alxb70mnjcjywT2Z+CtgI9NI42QodNC4t9qOjxiQzD+v73AyD+ZVzALWPyS4bBBFxHPDkzDy7efXAFTRmQOdm5rr2Vjc8/fbl/TT+VroJuCoz/6O91Q3q/cBTgQ9FRN/x9XOA3TtwXFrtSyeNy2XAv0XE1cBo4N3AqyOi0/5/abUfnTQmf2Ik/wzz6aOSVLgSzhFIkgZhEEhS4QwCSSqcQSBJhTMIJKlwu+zlo9JgImI/Gtdi70Xj0sNbgFMz88FtbP93wHWZ2f/2f6njOSNQcSLiSTQeGHdmZh6emS8GrgMuHORr76Jxk5K0y/E+AhUnIl4DHJ6ZJ/dbvxL4BXBBZi6LiFcAb6DxwLnzm20zgFOAOTRm1F/OzK9GxD81t90MXJ2Zp0bEGcBzgHHAnsAi4Bgaj0M4ITNXRsQ/AscBW4GLMvOsevde+lPOCFSiZ9N4wmt//0Pj9v0/kpnfA24G3gz8NXAUMBU4BJgYEc8DXtdcPoTG4wD6HnD2cGa+gsZdsLMy82jg08AbImIi8Hoa4TIDmBMRscP2UhoizxGoROuAFw2wfn/g6spy1wDbBPCzzNxC4/k274qI1wIrM/MxgIj4KY3AALip+e8HgDXNz/cDY4Dn0nh8ct+zi55KYwaxU788RbseZwQq0RIaT3r8QxhExDwab7LaCIxvrj6o8p1eGv+/rAUOiohRETE6In5A45DR1Ih4QvOpl4c110HjkM+2JLAaOKL5JrqvA6se575Jw2YQqDiZ+RCNt1t9MCKWR8R1NA71HAt8DXhPRPwQmFD52grgPOBOYBmwnMYLQs7PzFuAi5vrfgbcDnxnCHXcQmM2cE1E3EBjRrKzPuBNuzBPFktS4ZwRSFLhDAJJKpxBIEmFMwgkqXAGgSQVziCQpMIZBJJUuP8DqSzxKIuqNt4AAAAASUVORK5CYII=\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"d4.step()\n",
"decorate_dice('One die')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Make Cdf from sequence\n",
"\n",
"For comments or questions about this section, see [this issue](https://github.com/AllenDowney/EmpyricalDistributions/issues/14).\n",
"\n",
"\n",
"The following function makes a `Cdf` object from a sequence of values."
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" @staticmethod\n",
" def from_seq(seq, normalize=True, sort=True, **options):\n",
" \"\"\"Make a CDF from a sequence of values.\n",
"\n",
" seq: any kind of sequence\n",
" normalize: whether to normalize the Cdf, default True\n",
" sort: whether to sort the Cdf by values, default True\n",
" options: passed to the pd.Series constructor\n",
"\n",
" :return: CDF object\n",
" \"\"\"\n",
" pmf = Pmf.from_seq(seq, normalize=False, sort=sort, **options)\n",
" return pmf.make_cdf(normalize=normalize)\n",
"\n"
]
}
],
"source": [
"psource(Cdf.from_seq)"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>probs</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>a</th>\n",
" <td>0.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>e</th>\n",
" <td>0.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>l</th>\n",
" <td>0.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>n</th>\n",
" <td>1.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"a 0.2\n",
"e 0.4\n",
"l 0.8\n",
"n 1.0\n",
"dtype: float64"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cdf = Cdf.from_seq(list('allen'))\n",
"cdf"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>probs</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0.6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>0.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>1.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"1 0.2\n",
"2 0.6\n",
"3 0.8\n",
"5 1.0\n",
"dtype: float64"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cdf = Cdf.from_seq(np.array([1, 2, 2, 3, 5]))\n",
"cdf"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Selection\n",
"\n",
"For comments or questions about this section, see [this issue](https://github.com/AllenDowney/EmpyricalDistributions/issues/15).\n",
"\n",
"`Cdf` inherits [] from Series, so you can look up a quantile and get its cumulative probability."
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.25"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d4[1]"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1.0"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d4[4]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`Cdf` objects are mutable, but in general the result is not a valid Cdf."
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>probs</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0.50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>0.75</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>1.25</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"1 0.25\n",
"2 0.50\n",
"3 0.75\n",
"4 1.00\n",
"5 1.25\n",
"dtype: float64"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d4[5] = 1.25\n",
"d4"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>probs</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>0.6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>0.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>1.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"1 0.2\n",
"2 0.4\n",
"3 0.6\n",
"4 0.8\n",
"5 1.0\n",
"dtype: float64"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d4.normalize()\n",
"d4"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Evaluating CDFs\n",
"\n",
"For comments or questions about this section, see [this issue](https://github.com/AllenDowney/EmpyricalDistributions/issues/16).\n",
"\n",
"Evaluating a `Cdf` forward maps from a quantity to its cumulative probability."
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [],
"source": [
"d6 = Cdf.from_seq([1,2,3,4,5,6])"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(0.5)"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d6.forward(3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`forward` interpolates, so it works for quantities that are not in the distribution."
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(0.5)"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d6.forward(3.5)"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(0.)"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d6.forward(0)"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(1.)"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d6.forward(7)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`__call__` is a synonym for `forward`, so you can call the `Cdf` like a function (which it is)."
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(0.16666667)"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d6(1.5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`forward` can take an array of quantities, too."
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [],
"source": [
"def decorate_cdf(title):\n",
" \"\"\"Labels the axes.\n",
" \n",
" title: string\n",
" \"\"\"\n",
" plt.xlabel('Quantity')\n",
" plt.ylabel('CDF')\n",
" plt.title(title)"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"qs = np.linspace(0, 7)\n",
"ps = d6(qs)\n",
"plt.plot(qs, ps)\n",
"decorate_cdf('Forward evaluation')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`Cdf` also provides `inverse`, which computes the inverse `Cdf`:"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(3.)"
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d6.inverse(0.5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`quantile` is a synonym for `inverse`"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(3.)"
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d6.quantile(0.5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`inverse` and `quantile` work with arrays "
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"ps = np.linspace(0, 1)\n",
"qs = d6.quantile(ps)\n",
"plt.plot(qs, ps)\n",
"decorate_cdf('Inverse evaluation')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"These functions provide a simple way to make a Q-Q plot.\n",
"\n",
"Here are two samples from the same distribution."
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"cdf1 = Cdf.from_seq(np.random.normal(size=100))\n",
"cdf2 = Cdf.from_seq(np.random.normal(size=100))\n",
"\n",
"cdf1.plot()\n",
"cdf2.plot()\n",
"decorate_cdf('Two random samples')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here's how we compute the Q-Q plot."
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [],
"source": [
"def qq_plot(cdf1, cdf2):\n",
" \"\"\"Compute results for a Q-Q plot.\n",
" \n",
" Evaluates the inverse Cdfs for all range of cumulative probabilities.\n",
"\n",
" \n",
" :param cdf1: Cdf\n",
" :param cdf2: Cdf\n",
" \n",
" :return: tuple of arrays\n",
" \"\"\"\n",
" ps = np.linspace(0, 1)\n",
" q1 = cdf1.quantile(ps)\n",
" q2 = cdf2.quantile(ps)\n",
" return q1, q2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The result is near the identity line, which suggests that the samples are from the same distribution."
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"q1, q2 = qq_plot(cdf1, cdf2)\n",
"plt.plot(q1, q2)\n",
"plt.xlabel('Quantity 1')\n",
"plt.ylabel('Quantity 2')\n",
"plt.title('Q-Q plot');"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here's how we compute a P-P plot"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [],
"source": [
"def pp_plot(cdf1, cdf2):\n",
" \"\"\"Compute results for a P-P plot.\n",
" \n",
" Evaluates the Cdfs for all quantities in either Cdf.\n",
" \n",
" :param cdf1: Cdf\n",
" :param cdf2: Cdf\n",
" \n",
" :return: tuple of arrays\n",
" \"\"\"\n",
" qs = cdf1.index.union(cdf2)\n",
" p1 = cdf1(qs)\n",
" p2 = cdf2(qs)\n",
" return p1, p2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And here's what it looks like."
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"p1, p2 = pp_plot(cdf1, cdf2)\n",
"plt.plot(p1, p2)\n",
"plt.xlabel('Cdf 1')\n",
"plt.ylabel('Cdf 2')\n",
"plt.title('P-P plot');"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Statistics\n",
"\n",
"For comments or questions about this section, see [this issue](https://github.com/AllenDowney/EmpyricalDistributions/issues/17).\n",
"\n",
"`Cdf` overrides the statistics methods to compute `mean`, `median`, etc."
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" def mean(self):\n",
" \"\"\"Expected value.\n",
"\n",
" :return: float\n",
" \"\"\"\n",
" return self.make_pmf().mean()\n",
"\n"
]
}
],
"source": [
"psource(Cdf.mean)"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"3.5"
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d6.mean()"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" def var(self):\n",
" \"\"\"Variance.\n",
"\n",
" :return: float\n",
" \"\"\"\n",
" return self.make_pmf().var()\n",
"\n"
]
}
],
"source": [
"psource(Cdf.var)"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2.916666666666667"
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d6.var()"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" def std(self):\n",
" \"\"\"Standard deviation.\n",
"\n",
" :return: float\n",
" \"\"\"\n",
" return self.make_pmf().std()\n",
"\n"
]
}
],
"source": [
"psource(Cdf.std)"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1.7078251276599332"
]
},
"execution_count": 57,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d6.std()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Sampling\n",
"\n",
"For comments or questions about this section, see [this issue](https://github.com/AllenDowney/EmpyricalDistributions/issues/18).\n",
"\n",
"`choice` chooses a random values from the Cdf, following the API of `np.random.choice`"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" def choice(self, *args, **kwargs):\n",
" \"\"\"Makes a random sample.\n",
"\n",
" Uses the probabilities as weights unless `p` is provided.\n",
"\n",
" args: same as np.random.choice\n",
" options: same as np.random.choice\n",
"\n",
" :return: NumPy array\n",
" \"\"\"\n",
" # TODO: Make this more efficient by implementing the inverse CDF method.\n",
" pmf = self.make_pmf()\n",
" return pmf.choice(*args, **kwargs)\n",
"\n"
]
}
],
"source": [
"psource(Cdf.choice)"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([1, 1, 3, 2, 4, 1, 4, 1, 1, 6])"
]
},
"execution_count": 59,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d6.choice(size=10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`sample` chooses a random values from the `Cdf`, following the API of `pd.Series.sample`"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" def sample(self, *args, **kwargs):\n",
" \"\"\"Makes a random sample.\n",
"\n",
" Uses the probabilities as weights unless `weights` is provided.\n",
"\n",
" This function returns an array containing a sample of the quantities in this Pmf,\n",
" which is different from Series.sample, which returns a Series with a sample of\n",
" the rows in the original Series.\n",
"\n",
" args: same as Series.sample\n",
" options: same as Series.sample\n",
"\n",
" :return: NumPy array\n",
" \"\"\"\n",
" # TODO: Make this more efficient by implementing the inverse CDF method.\n",
" pmf = self.make_pmf()\n",
" return pmf.sample(*args, **kwargs)\n",
"\n"
]
}
],
"source": [
"psource(Cdf.sample)"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([3, 1, 2, 5, 4, 6, 2, 2, 4, 2])"
]
},
"execution_count": 61,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d6.sample(n=10, replace=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.7"
}
},
"nbformat": 4,
"nbformat_minor": 1
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment