Skip to content

Instantly share code, notes, and snippets.

@danielfleischer
Created December 30, 2018 09:36
Show Gist options
  • Save danielfleischer/9c7cacc761d8f455ca3eac683eb3d8f9 to your computer and use it in GitHub Desktop.
Save danielfleischer/9c7cacc761d8f455ca3eac683eb3d8f9 to your computer and use it in GitHub Desktop.
14 statistical tests in python, using scipy.stats.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [],
"source": [
"a = np.random.randint(1,10,100)"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [],
"source": [
"b = np.random.normal(size=100)"
]
},
{
"cell_type": "markdown",
"metadata": {
"toc-hr-collapsed": false
},
"source": [
"# Normality Tests"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Shapiro Wilk\n",
"Tests for Gaussian distribution. Observationgs need to be IID."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from scipy.stats import shapiro"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(0.9135679006576538, 6.518070676975185e-06)"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"shapiro(a)"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(0.9860052466392517, 0.374272882938385)"
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"shapiro(b)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## D'Agostino's $K^2$\n",
"Tests for Gaussian distribution. Observationgs need to be IID."
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [],
"source": [
"from scipy.stats import normaltest"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"NormaltestResult(statistic=41.39502617899559, pvalue=1.0260872144248276e-09)"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"normaltest(a)"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"NormaltestResult(statistic=3.593125002661302, pvalue=0.16586808066854905)"
]
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"normaltest(b)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Anderson-Darling\n",
"Tests for Gaussian distribution. Observationgs need to be IID."
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [],
"source": [
"from scipy.stats import anderson"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AndersonResult(statistic=2.3516564884300664, critical_values=array([0.555, 0.632, 0.759, 0.885, 1.053]), significance_level=array([15. , 10. , 5. , 2.5, 1. ]))"
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"anderson(a)"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AndersonResult(statistic=0.3417980857269356, critical_values=array([0.555, 0.632, 0.759, 0.885, 1.053]), significance_level=array([15. , 10. , 5. , 2.5, 1. ]))"
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"anderson(b)"
]
},
{
"cell_type": "markdown",
"metadata": {
"toc-hr-collapsed": false
},
"source": [
"# Correlation Tests"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Pearson Coefficient\n",
"Tests whether two samples have a linear relationship. Observations need to be IID, in each sample to be normal, in each sample to have same variance. "
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {},
"outputs": [],
"source": [
"from scipy.stats import pearsonr"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(0.0820377744456807, 0.4171172980326803)"
]
},
"execution_count": 56,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pearsonr(a,b)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Spearman Rank\n",
"Tests whether two samples have a monotonic relationship. Observations need to be IID, able to be ranked. "
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {},
"outputs": [],
"source": [
"from scipy.stats import spearmanr"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"SpearmanrResult(correlation=0.09828266099415585, pvalue=0.3306388846208107)"
]
},
"execution_count": 58,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"spearmanr(a,b)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Kendall Rank\n",
"Tests whether two samples have a monotonic relationship. Observations need to be IID, able to be ranked. "
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {},
"outputs": [],
"source": [
"from scipy.stats import kendalltau"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"KendalltauResult(correlation=0.07047459480753636, pvalue=0.32412404755614377)"
]
},
"execution_count": 60,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"kendalltau(a,b)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## $\\chi^2$ Test\n",
"Tests whether two categorical variables are related or independent. Observations need to be independent, at least 25 examples in each cell."
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {},
"outputs": [],
"source": [
"from scipy.stats import chi2_contingency"
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {},
"outputs": [],
"source": [
"table = np.array([[40, 10],[30, 40]])"
]
},
{
"cell_type": "code",
"execution_count": 72,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(15.062204081632652,\n",
" 0.00010402546661835286,\n",
" 1,\n",
" array([[29.16666667, 20.83333333],\n",
" [40.83333333, 29.16666667]]))"
]
},
"execution_count": 72,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chi2_contingency(table)"
]
},
{
"cell_type": "markdown",
"metadata": {
"toc-hr-collapsed": false
},
"source": [
"# Hypothesis Tests"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## T Test\n",
"Tests whether two samples' means are significantly different. Observations need to be independent, in each sample normal, in each sample same variance."
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {},
"outputs": [],
"source": [
"from scipy.stats import ttest_ind"
]
},
{
"cell_type": "code",
"execution_count": 78,
"metadata": {},
"outputs": [],
"source": [
"a = np.random.normal(loc=10, size=100)"
]
},
{
"cell_type": "code",
"execution_count": 84,
"metadata": {},
"outputs": [],
"source": [
"b = np.random.normal(loc=11, size=100)"
]
},
{
"cell_type": "code",
"execution_count": 85,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Ttest_indResult(statistic=-9.211238980494853, pvalue=4.740751714730386e-17)"
]
},
"execution_count": 85,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ttest_ind(a,b)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Paired Student T Test\n",
"Tests whether two paired samples' means are significantly different. Observations need to be independent, in each sample normal, in each sample same variance. Observations are paired."
]
},
{
"cell_type": "code",
"execution_count": 87,
"metadata": {},
"outputs": [],
"source": [
"from scipy.stats import ttest_rel"
]
},
{
"cell_type": "code",
"execution_count": 95,
"metadata": {},
"outputs": [],
"source": [
"a = np.random.normal(loc=9.5, size=100)"
]
},
{
"cell_type": "code",
"execution_count": 96,
"metadata": {},
"outputs": [],
"source": [
"b = np.random.normal(loc=10.1, size=100)"
]
},
{
"cell_type": "code",
"execution_count": 97,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Ttest_relResult(statistic=-4.265035140526937, pvalue=4.575138155519719e-05)"
]
},
"execution_count": 97,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ttest_rel(a,b)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Analysis of Variance ANOVA\n",
"Tests whether two or more samples' means are significantly different. Observations need to be independent, in each sample normal, in each sample same variance."
]
},
{
"cell_type": "code",
"execution_count": 98,
"metadata": {},
"outputs": [],
"source": [
"from scipy.stats import f_oneway"
]
},
{
"cell_type": "code",
"execution_count": 102,
"metadata": {},
"outputs": [],
"source": [
"a = np.random.normal(loc=3, size=100)"
]
},
{
"cell_type": "code",
"execution_count": 103,
"metadata": {},
"outputs": [],
"source": [
"b = np.random.normal(loc=11, size=100)"
]
},
{
"cell_type": "code",
"execution_count": 104,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"F_onewayResult(statistic=3087.4306882609585, pvalue=9.854457388246251e-123)"
]
},
"execution_count": 104,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"f_oneway(a,b)"
]
},
{
"cell_type": "markdown",
"metadata": {
"toc-hr-collapsed": false
},
"source": [
"# Non-parametric Hypothesis Testing"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Mann-Whitney U Test\n",
"Tests whether the distributions of two samples are equal or not. Observations need to be independent, can be ranked."
]
},
{
"cell_type": "code",
"execution_count": 105,
"metadata": {},
"outputs": [],
"source": [
"from scipy.stats import mannwhitneyu"
]
},
{
"cell_type": "code",
"execution_count": 115,
"metadata": {},
"outputs": [],
"source": [
"a = np.random.randint(1,10,size=100)"
]
},
{
"cell_type": "code",
"execution_count": 116,
"metadata": {},
"outputs": [],
"source": [
"b = np.random.normal(loc=0, size=100)"
]
},
{
"cell_type": "code",
"execution_count": 117,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"MannwhitneyuResult(statistic=161.0, pvalue=1.3099464346996372e-32)"
]
},
"execution_count": 117,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mannwhitneyu(a,b)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Wilcoxon Signed Rank Test\n",
"Tests whether the distributions of two paired samples are equal or not. Observations need to be independent, can be ranked, are paired"
]
},
{
"cell_type": "code",
"execution_count": 118,
"metadata": {},
"outputs": [],
"source": [
"from scipy.stats import wilcoxon"
]
},
{
"cell_type": "code",
"execution_count": 128,
"metadata": {},
"outputs": [],
"source": [
"a = np.random.normal(loc=0,size=100)"
]
},
{
"cell_type": "code",
"execution_count": 129,
"metadata": {},
"outputs": [],
"source": [
"b = np.random.normal(loc=0, size=100)"
]
},
{
"cell_type": "code",
"execution_count": 130,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"WilcoxonResult(statistic=2525.0, pvalue=1.0)"
]
},
"execution_count": 130,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"wilcoxon(a,b)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Kruskal-Wallis H Test\n",
"Tests whether the distributions of two or more samples are equal or not. Observations need to be independent, can be ranked."
]
},
{
"cell_type": "code",
"execution_count": 131,
"metadata": {},
"outputs": [],
"source": [
"from scipy.stats import kruskal"
]
},
{
"cell_type": "code",
"execution_count": 145,
"metadata": {},
"outputs": [],
"source": [
"a = np.random.normal(loc=0,size=100)"
]
},
{
"cell_type": "code",
"execution_count": 148,
"metadata": {},
"outputs": [],
"source": [
"b = np.random.normal(loc=1.3, size=100)"
]
},
{
"cell_type": "code",
"execution_count": 149,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"KruskalResult(statistic=56.23140895522397, pvalue=6.442411772846654e-14)"
]
},
"execution_count": 149,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"kruskal(a,b)"
]
},
{
"cell_type": "markdown",
"metadata": {
"toc-hr-collapsed": false
},
"source": [
"## Friedman Test\n",
"Tests whether the distributions of two or more paired samples are equal or not. Observations need to be independent, can be ranked, are paired."
]
},
{
"cell_type": "code",
"execution_count": 154,
"metadata": {},
"outputs": [],
"source": [
"from scipy.stats import friedmanchisquare"
]
},
{
"cell_type": "code",
"execution_count": 155,
"metadata": {},
"outputs": [],
"source": [
"a = np.random.normal(loc=0,size=100)"
]
},
{
"cell_type": "code",
"execution_count": 156,
"metadata": {},
"outputs": [],
"source": [
"b = np.random.normal(loc=1.3, size=100)"
]
},
{
"cell_type": "code",
"execution_count": 158,
"metadata": {},
"outputs": [],
"source": [
"c = np.random.normal(loc=1.3, size=100)"
]
},
{
"cell_type": "code",
"execution_count": 159,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"FriedmanchisquareResult(statistic=52.460000000000036, pvalue=4.059342910977334e-12)"
]
},
"execution_count": 159,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"friedmanchisquare(a,b,c)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.2"
},
"toc-autonumbering": false,
"toc-showcode": false,
"toc-showmarkdowntxt": false
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment