tonicanada/20220409_understanding_clt_and_ttest_part4.ipynb

## 20220409_understanding_clt_and_ttest_part4.ipynb
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "    \"mean\": 20,\n",
      "    \"std\": 3.021175268004159,\n",
      "    \"n\": 12\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "#We compute n, mean and standard deviation\n",
    "x_example1 = [21.5, 24.5, 18.5, 17.2, 14.5, \n",
    "              23.2, 22.1, 20.5, 19.4, 18.1, 24.1, 18.5]\n",
    "mu_example1 = 20\n",
    "mean_example1 = np.mean(x_example1)\n",
    "std_example1 = np.std(x_example1, ddof=1)\n",
    "n_example1 = len(x_example1)\n",
    "params_example1 = {\n",
    "    \"mean\": mu_example1,\n",
    "    \"std\": std_example1,\n",
    "    \"n\": n_example1\n",
    "}\n",
    "print(json.dumps(params_example1, indent=4))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\n",
    "\\mu = 20\n",
    "$$\n",
    "$$\n",
    "{\\overline{x}} = 20.175\n",
    "$$\n",
    "$$\n",
    "s = 2.892555\n",
    "$$\n",
    "$$\n",
    "n = 12\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With that information we can compute the 't-statistic', which standarizes the distribution of the sample means (seen abobe in the CLT chapter)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\n",
    "t = \\frac{\\overline{x}-\\mu}{SE}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Where SE is the 'Standard Error of the mean', that we approximate subsituting $\\sigma$ by s, because we don't now the population deviation standard."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\n",
    "SE = \\frac{\\sigma}{\\sqrt{n}} \\approx \\frac{s}{\\sqrt{n}}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we assume the following hypotheses:<br>\n",
    "H0: Null hypotheses. We can consider that $\\mu \\approx {\\overline{x}}$ <br>\n",
    "H1: We can reject the null hypotheses, concluding that $\\mu < {\\overline{x}}$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.2006562773994862\n"
     ]
    }
   ],
   "source": [
    "t_value_example1 = (mean_example1-mu_example1)/(std_example1/np.sqrt(n_example1))\n",
    "print(t_value_example1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\n",
    "t = \\frac{20.175-20}{\\frac{2.892555}{\\sqrt{12}}} = 0.2006562773994862\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we calculate the probability that H0 can be true, we go to the t-student table and check. Here we're going to use t-student distribution because n < 30 (this is more accurate that using normal distribution), but as we'll see, results are very similar."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.4223145946526807"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#Using t-student distribtion\n",
    "t.sf(t_value_example1, df=n_example1-1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.42048367493849975"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#Using normal distribution\n",
    "norm.sf(t_value_example1)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
	{
	"cells": [
	{
	"cell_type": "code",
	"execution_count": 26,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"{\n",
	" \"mean\": 20,\n",
	" \"std\": 3.021175268004159,\n",
	" \"n\": 12\n",
	"}\n"
	]
	}
	],
	"source": [
	"#We compute n, mean and standard deviation\n",
	"x_example1 = [21.5, 24.5, 18.5, 17.2, 14.5, \n",
	" 23.2, 22.1, 20.5, 19.4, 18.1, 24.1, 18.5]\n",
	"mu_example1 = 20\n",
	"mean_example1 = np.mean(x_example1)\n",
	"std_example1 = np.std(x_example1, ddof=1)\n",
	"n_example1 = len(x_example1)\n",
	"params_example1 = {\n",
	" \"mean\": mu_example1,\n",
	" \"std\": std_example1,\n",
	" \"n\": n_example1\n",
	"}\n",
	"print(json.dumps(params_example1, indent=4))"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"$$\n",
	"\\mu = 20\n",
	"$$\n",
	"$$\n",
	"{\\overline{x}} = 20.175\n",
	"$$\n",
	"$$\n",
	"s = 2.892555\n",
	"$$\n",
	"$$\n",
	"n = 12\n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"With that information we can compute the 't-statistic', which standarizes the distribution of the sample means (seen abobe in the CLT chapter)."
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"$$\n",
	"t = \\frac{\\overline{x}-\\mu}{SE}\n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Where SE is the 'Standard Error of the mean', that we approximate subsituting $\\sigma$ by s, because we don't now the population deviation standard."
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"$$\n",
	"SE = \\frac{\\sigma}{\\sqrt{n}} \\approx \\frac{s}{\\sqrt{n}}\n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Now we assume the following hypotheses:<br>\n",
	"H0: Null hypotheses. We can consider that $\\mu \\approx {\\overline{x}}$ <br>\n",
	"H1: We can reject the null hypotheses, concluding that $\\mu < {\\overline{x}}$"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 27,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"0.2006562773994862\n"
	]
	}
	],
	"source": [
	"t_value_example1 = (mean_example1-mu_example1)/(std_example1/np.sqrt(n_example1))\n",
	"print(t_value_example1)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"$$\n",
	"t = \\frac{20.175-20}{\\frac{2.892555}{\\sqrt{12}}} = 0.2006562773994862\n",
	"$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Now we calculate the probability that H0 can be true, we go to the t-student table and check. Here we're going to use t-student distribution because n < 30 (this is more accurate that using normal distribution), but as we'll see, results are very similar."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 28,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"0.4223145946526807"
	]
	},
	"execution_count": 28,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"#Using t-student distribtion\n",
	"t.sf(t_value_example1, df=n_example1-1)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 29,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"0.42048367493849975"
	]
	},
	"execution_count": 29,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"#Using normal distribution\n",
	"norm.sf(t_value_example1)"
	]
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.8.8"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 4
	}