Skip to content

Instantly share code, notes, and snippets.

@lumbric
Created July 15, 2019 12:49
Show Gist options
  • Save lumbric/11c263687ba355e95725146412ec8490 to your computer and use it in GitHub Desktop.
Save lumbric/11c263687ba355e95725146412ec8490 to your computer and use it in GitHub Desktop.
Correlation
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Correlation between drowning by falling and swimming in a pool\n",
"\n",
"Is the correlation between drowning by falling and swimming in a pool surprisingly low?\n",
"\n",
"https://www.tylervigen.com/spurious-correlations"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Original datasets:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"swimming = 421, 465, 494, 538, 430, 530, 511, 600, 582, 605, 603\n",
"falling = 109, 102, 102, 98, 85, 95, 96, 98, 123, 94, 102"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Assuming a normal distribution, calculate mean and standard deviation:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"falling_mean = np.mean(falling)\n",
"falling_std = np.std(falling, ddof=1)\n",
"\n",
"swimming_std = np.std(swimming, ddof=1)\n",
"swimming_mean = np.mean(swimming)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that we can savely assume that `swimming` is normal distributed as sum of two normal distributed variables `falling` and (the unknown) `only_swimming`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using the [formula for the sum of normal distributed random variables](https://en.wikipedia.org/wiki/Sum_of_normally_distributed_random_variables#Independent_random_variables), the standard deviation of `only_swimming` is given by"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"only_swimming_std = np.sqrt(swimming_std**2 - falling_std**2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We now can generate new time series:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"N = 1e5\n",
"only_swimming_new = np.random.normal(swimming_mean - falling_mean,\n",
" only_swimming_std,\n",
" size=int(N))\n",
"\n",
"falling_new = np.random.normal(falling_mean,\n",
" falling_std,\n",
" size=int(N))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Correlation is 0, otherwise we would have done something wrong:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 1.00000000e+00, -3.92608199e-04],\n",
" [-3.92608199e-04, 1.00000000e+00]])"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"\n",
"np.corrcoef(falling_new, only_swimming_new)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In our artificial data set the correlation is not very high:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[1. , 0.14170122],\n",
" [0.14170122, 1. ]])"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.corrcoef(falling_new, only_swimming_new + falling_new)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is very similar to the real values:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[1. , 0.17511449],\n",
" [0.17511449, 1. ]])"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.corrcoef(falling, swimming)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment