Last active
January 6, 2023 15:34
-
-
Save xiaoouwang/e2e934063f32d3e32c19bc6b92b23ed0 to your computer and use it in GitHub Desktop.
Independent t-test by hand in Python: with equal sample sizes and variance
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"ExecuteTime": { | |
"end_time": "2021-01-15T19:33:40.167472Z", | |
"start_time": "2021-01-15T19:33:40.165268Z" | |
} | |
}, | |
"source": [ | |
"# Independent two-sample t-test with equal sample sizes and variance\n", | |
"> Detail: https://www.nlpinpython.com/article/article-detail/4/\n", | |
"> Medium:https://xiaoouwang.medium.com/t-test-by-hand-in-python-d5513d1b55eb\n", | |
"\n", | |
"> Author: Xiaoou Wang. https://www.linkedin.com/in/xiaoou-wang " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 63, | |
"metadata": { | |
"ExecuteTime": { | |
"end_time": "2021-01-15T19:36:48.188317Z", | |
"start_time": "2021-01-15T19:36:48.182993Z" | |
} | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"The mean of a is around 4 -> 3.9028591091939 \n", | |
"The mean of b is around 0 -> -0.16958838211535887\n", | |
"The variance of a is around 1 -> 1.418239361307898 \n", | |
"The variance of b is around 1 -> 0.9982804180510149\n", | |
"The degree of freedom is -> 18\n" | |
] | |
} | |
], | |
"source": [ | |
"## necessary packages\n", | |
"import numpy as np\n", | |
"from scipy import stats\n", | |
"from numpy.random import seed\n", | |
"\n", | |
"## generate data\n", | |
"# set the seed\n", | |
"seed(1)\n", | |
"N = 10\n", | |
"# Gaussian distributed data with mean = 2 and var = 1\n", | |
"# see https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.random.randn.html\n", | |
"a = np.random.randn(N) + 4\n", | |
"#Gaussian distributed data with with mean = 0 and var = 1\n", | |
"b = np.random.randn(N)\n", | |
"print(\"The mean of a is around 4 ->\",np.mean(a),\"\\nThe mean of b is around 0 ->\",np.mean(b))\n", | |
"print(\"The variance of a is around 1 ->\",np.var(a),\"\\nThe variance of b is around 1 ->\",np.var(b))\n", | |
"df = N*2 - 2\n", | |
"print(\"The degree of freedom is ->\",df)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Formula for variance\n", | |
"\n", | |
"$$s_N = \\sqrt{\\frac{1}{N} \\sum_{i=1}^N \\left(x_i - \\bar{x}\\right)^2}.$$\n", | |
"\n", | |
"## Formula for corrected variance\n", | |
"\n", | |
"An unbiased estimator for the *variance* is given by applying [Bessel's correction](https://www.wikiwand.com/en/Bessel%27s_correction), using *N−1* instead of *N* to yield the *unbiased sample variance* denoted *s<sup>2</sup>*:\n", | |
"$$s^2 = \\frac{1}{N - 1} \\sum_{i=1}^N \\left(x_i - \\bar{x}\\right)^2.$$\n", | |
"\n", | |
"## Formula for t statistic\n", | |
"\n", | |
"$$ t = \\frac{\\bar{X}_1 - \\bar{X}_2}{s_p \\sqrt\\frac{2}{n}} $$\n", | |
"\n", | |
"where\n", | |
"$$ s_p = \\sqrt{\\frac{s_{X_1}^2+s_{X_2}^2}{2}}.$$" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 67, | |
"metadata": { | |
"ExecuteTime": { | |
"end_time": "2021-01-15T19:38:17.419549Z", | |
"start_time": "2021-01-15T19:38:17.413622Z" | |
} | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"t hand calculated -> 7.859258455468933\n", | |
"p hand calculated -> 3.152614480583793e-07\n", | |
"t using scipy -> 7.8592584554689315\n", | |
"p using scipy -> 3.152614479389834e-07\n" | |
] | |
} | |
], | |
"source": [ | |
"# ddof=1 -> corercted variance\n", | |
"var_a = a.var(ddof=1)\n", | |
"var_b = b.var(ddof=1)\n", | |
"sp = (var_a/2+var_b/2)**0.5\n", | |
"\n", | |
"## Calculate the t-statistics\n", | |
"t = (a.mean() - b.mean())/(sp*np.sqrt(2/N))\n", | |
"\n", | |
"# p-value\n", | |
"p = 1 - stats.t.cdf(t,df=df)\n", | |
"\n", | |
"print(\"t hand calculated ->\",t)\n", | |
"print(\"p hand calculated ->\",2*p)\n", | |
"\n", | |
"## Cross Checking with the internal scipy function\n", | |
"t2, p2 = stats.ttest_ind(a,b)\n", | |
"print(\"t using scipy ->\",t2)\n", | |
"print(\"p using scipy ->\",p2)" | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.7.6" | |
}, | |
"varInspector": { | |
"cols": { | |
"lenName": 16, | |
"lenType": 16, | |
"lenVar": 40 | |
}, | |
"kernels_config": { | |
"python": { | |
"delete_cmd_postfix": "", | |
"delete_cmd_prefix": "del ", | |
"library": "var_list.py", | |
"varRefreshCmd": "print(var_dic_list())" | |
}, | |
"r": { | |
"delete_cmd_postfix": ") ", | |
"delete_cmd_prefix": "rm(", | |
"library": "var_list.r", | |
"varRefreshCmd": "cat(var_dic_list()) " | |
} | |
}, | |
"types_to_exclude": [ | |
"module", | |
"function", | |
"builtin_function_or_method", | |
"instance", | |
"_Feature" | |
], | |
"window_display": false | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 4 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment