Skip to content

Instantly share code, notes, and snippets.

@sgodfrey66
Created March 11, 2019 05:43
Show Gist options
  • Save sgodfrey66/f80dd31811c4c3bb010ea35831030c16 to your computer and use it in GitHub Desktop.
Save sgodfrey66/f80dd31811c4c3bb010ea35831030c16 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Using Python to visualize distributions\n",
"\n",
"Using Python and some of its libraries can be a great way to visualize, compare and understand probability distributions. Combining statistics calculation functionality with plotting tools allows us to easily see the shape of probability density functions and to answer thought experiments. This blog posts looks at some of the common ways this is done.\n",
"\n",
"Let's start by importing the numpy, pandas and matplotlib libraries."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"# Import libraries\n",
"import numpy as np\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"\n",
"%matplotlib inline\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Discrete distributions\n",
"\n",
"The first example looks a rolling a dice. Here we can simulate the rolling of a dice by using the numpy random integer generator, randint. In order to be able to replicate the results in the future, we'll pick a seed for the random number generator.\n",
"\n",
"If we run this for 10 samples, we get the following output: array([4, 2, 6, 4, 2, 6, 1, 5, 1, 6]) where these numbers represent the results of a six-sided dice. Now we can run it again with a sample size of 1000 and plot the results in a histogram. "
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([4, 2, 6, 4, 2, 6, 1, 5, 1, 6])"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Pick a seed for the random number generator\n",
"np.random.seed(101)\n",
"\n",
"# Generate random integers between 1 and 6 for a sample size of 10\n",
"np.random.randint(1, 7, 10)\n"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Generate random integers between 1 and 6 for a sample size of 10000\n",
"x = np.random.randint(1, 7, 10000)\n",
"\n",
"\n",
"plt.hist(x)\n",
"plt.title(\"Histogram of simulated dice rolls\")\n",
"plt.xlim([0,7]);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also use the simulated dice rolls to help gain insight into probability questions, particularly complex ones. For example, we might be faced with a problem asking to find the probability of rolling two sixes in 10 rolls? This can be straightforwardly simulated with Python and numpy."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.302\n"
]
}
],
"source": [
"# Simulate\n",
"n_sims = 1000\n",
"count = 0\n",
"for i in range(n_sims):\n",
" # simulate ten rolls\n",
" rolls = np.random.randint(1,7,10)\n",
" \n",
" # Look for two sixes in the 10 rolls\n",
" if len(np.where(rolls == 6)[0]) == 2:\n",
" count += 1\n",
" \n",
"print(count/n_sims)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Similarly, we can simulate trials from a geometric distribution. Here the geometric distribution provides the probability of success on the nth trial for a Bernoulli process. In this example, we can determine the probability it takes 5 tries to be successful when the probability of success with each trial is 20%."
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Generate random integers between 1 and 6 for a sample size of 10000\n",
"y = np.random.geometric(.2, 10000)\n",
"\n",
"plt.hist(y)\n",
"plt.title(\"Histogram of a simulated geometric trials\")\n",
"plt.xlim([0,30]);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## References\n",
"\n",
"https://docs.scipy.org/doc/numpy/reference/routines.random.html\n",
"https://matplotlib.org/api/_as_gen/matplotlib.pyplot.html\n",
"https://stackoverflow.com/questions/6294179/how-to-find-all-occurrences-of-an-element-in-a-list\n",
"https://en.wikipedia.org/wiki/Geometric_distribution\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.8"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment