Skip to content

Instantly share code, notes, and snippets.

@lmanul
Created June 3, 2021 04:10
Show Gist options
  • Save lmanul/1c1c28f4f9400f72db8bc250681207c7 to your computer and use it in GitHub Desktop.
Save lmanul/1c1c28f4f9400f72db8bc250681207c7 to your computer and use it in GitHub Desktop.
Karla Problem Set 2
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Problem Set 2\n",
"\n",
"### Name (double-click on this cell to edit): REPLACE WITH YOUR NAME\n",
"\n",
"### Instructions\n",
"\n",
"* Wherever you see a comment `# YOUR CODE HERE`, you have a programming task. Use the **code cell** given or add as many new code cells as you would like.\n",
"* Occassionally, you may be asked to use **markdown cells** to answer conceptual questions.\n",
"* Please make sure your final notebook is run, with output saved, and has no errors.\n",
"* Be sure to name your completed notebook file _lastname-firstname-problemset2.ipynb_ for Canvas submission"
]
},
{
"cell_type": "code",
"execution_count": 106,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"%matplotlib inline\n",
"\n",
"## Feel free to add any additional libraries you would like below this line\n",
"import math\n",
"import seaborn as sns\n",
"import scipy"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Part I: Functions\n",
"\n",
"### Implementation (10 points)\n",
"\n",
"Suppose you have done a statistical comparison using the Z-scores approach. Write a function for calculating the lower bound of the 95% CI. A starter code for this function is already provided to you below."
]
},
{
"cell_type": "code",
"execution_count": 84,
"metadata": {},
"outputs": [],
"source": [
"def compute_lower_bound(n, mu, stdev, z=1.96):\n",
" \"\"\"\n",
" Computes lower bound of a confidence interval\n",
" Parameters:\n",
" n: Type int. number of data points\n",
" mu: Type float. sample mean\n",
" stdev: Type float. sample standard deviation\n",
" z: Type float. critical value. Default: 1.96\n",
" \"\"\"\n",
" return mu - (z * stdev) / math.sqrt(n) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Test Case (5 points)\n",
"\n",
"Run your function on the following scenario:\n",
"\n",
"* There are 100 data points.\n",
"* The sample mean is $5.5$\n",
"* The sample standard deviation is $2.3$\n",
"* The critical value at 95% confidence, $Z_{0.95}$, is a constant: $1.96$"
]
},
{
"cell_type": "code",
"execution_count": 85,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"5.0492"
]
},
"execution_count": 85,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"compute_lower_bound(100, 5.5, 2.3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Part II: Basic Statistics in Python\n",
"\n",
"## Question 1 (5 points)\n",
"\n",
"Use [np.random.randint](https://numpy.org/doc/stable/reference/random/generated/numpy.random.randint.html) to simulate age data for 100 adults, whose age range from 18 to 36.\n",
"\n",
"Then, calculate basic descriptive statistics (mean, median, standard deviation) for your simulated data. "
]
},
{
"cell_type": "code",
"execution_count": 96,
"metadata": {},
"outputs": [],
"source": [
"# Simulate age data:\n",
"age_group1 = np.random.randint(18, 36, 100)"
]
},
{
"cell_type": "code",
"execution_count": 97,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"27.7"
]
},
"execution_count": 97,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Compute descriptive statistics:\n",
"np.mean(age_group1)"
]
},
{
"cell_type": "code",
"execution_count": 98,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"28.0"
]
},
"execution_count": 98,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.median(age_group1)"
]
},
{
"cell_type": "code",
"execution_count": 99,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"5.101960407529639"
]
},
"execution_count": 99,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.std(age_group1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Question 2 (5 points)\n",
"\n",
"Repeat the process in Question 1, except the age range is now 27 to 54."
]
},
{
"cell_type": "code",
"execution_count": 100,
"metadata": {},
"outputs": [],
"source": [
"# Simulate age data:\n",
"age_group2 = np.random.randint(27, 54, 100)"
]
},
{
"cell_type": "code",
"execution_count": 101,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"40.39"
]
},
"execution_count": 101,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Compute descriptive statistics for the new group:\n",
"np.mean(age_group2)"
]
},
{
"cell_type": "code",
"execution_count": 102,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"40.0"
]
},
"execution_count": 102,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.median(age_group2)"
]
},
{
"cell_type": "code",
"execution_count": 103,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"8.141124001020989"
]
},
"execution_count": 103,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.std(age_group2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Question 3 (10 points)\n",
"\n",
"Display each age group using a **side-by-side boxplot**. You MUST use either [matplotlib](https://matplotlib.org/) or [seaborn](https://seaborn.pydata.org/) library for this task.\n",
"\n",
"Hint: Consider using `pandas` to concatenate `age_group1` and `age_group2` to a single DataFrame, which is easier to visualize."
]
},
{
"cell_type": "code",
"execution_count": 104,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<AxesSubplot:>"
]
},
"execution_count": 104,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXAAAAD4CAYAAAD1jb0+AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy86wFpkAAAACXBIWXMAAAsTAAALEwEAmpwYAAAOCElEQVR4nO3db4hld33H8ffHMTWrcTEhk2WZqNs6S60IbmQIQp5YoyWNYvSBRcGw0MD6oBlGKtiYR4o0SPHfMhRhraFLtX8CKgkh/bOshiJIdDauMWG3ZJAkddzujkltkmYbm823D+aMHXZnnLPZuXPml32/4HLvOfeeez8Mlw+//e3v3JOqQpLUnlcMHUCS9NJY4JLUKAtckhplgUtSoyxwSWrUKzfzw6688sratWvXZn6kJDXvyJEjv6iq8bP3b2qB79q1i7m5uc38SElqXpLHV9vvFIokNcoCl6RGWeCS1CgLXJIaZYFLUqMscElqlAUuSY3a1HXgkkZrdnaW+fn5QTMsLCwAMDExMWgOgMnJSaanp4eOMTIWuKQNdfr06aEjXDQscOllZCuMNmdmZgDYv3//wEle/pwDl6RGWeCS1CgLXJIaZYFLUqMscElqlAUuSY2ywCWpURa4JDXKApekRlngktQoC1ySGmWBS1Kjev2YVZLHgGeAM8ALVTWV5ArgH4BdwGPAH1XVf44mpiTpbOczAv/9qtpTVVPd9m3A4araDRzutiVJm+RCplBuAg52jw8CH7jgNJKk3voWeAH/kuRIkn3dvh1VdQKgu79qtQOT7Esyl2RucXHxwhNLkoD+F3S4rqp+nuQq4FCS430/oKoOAAcApqam6iVklCStotcIvKp+3t2fAr4NXAucTLIToLs/NaqQkqRzrVvgSV6T5LXLj4E/AB4G7gH2di/bC9w9qpCSpHP1mULZAXw7yfLr/7aq/inJD4G7ktwCPAF8aHQxJUlnW7fAq+qnwNtW2f8kcP0oQkmS1ueZmJLUKAtckhplgUtSo/quA5f0G8zOzjI/Pz90jC1h+e8wMzMzcJKtYXJykunp6ZG8twUubYD5+XkefeRHvOGyM0NHGdxv/e/SP+yff3xu4CTDe+LZsZG+vwUubZA3XHaG29/+9NAxtIXc8eD2kb6/c+CS1CgLXJIaZYFLUqMscElqlAUuSY2ywCWpURa4JDXKApekRlngktQoC1ySGmWBS1KjLHBJapQFLkmN6l3gScaS/CjJvd32p5MsJDna3W4cXUxJ0tnO5+dkZ4BjwMrfR/xSVX1+YyNJkvroNQJPcjXwXuCvRhtHktRX3ymULwOfBF48a/+tSR5KcmeSy1c7MMm+JHNJ5hYXFy8gqiRppXULPMn7gFNVdeSsp74CvAnYA5wAvrDa8VV1oKqmqmpqfHz8AuNKkpb1mQO/Dnh/95+UlwLbk3y9qj66/IIkXwXuHVFGSdIq1h2BV9WnqurqqtoFfBj4TlV9NMnOFS/7IPDwiDJKklZxIRc1/oske4ACHgM+thGBJEn9nFeBV9X9wP3d45tHkEeS1JNnYkpSoyxwSWqUBS5JjbLAJalRFrgkNcoCl6RGWeCS1CgLXJIaZYFLUqMscElqlAUuSY2ywCWpURa4JDXKApekRlngktQoC1ySGmWBS1KjLHBJapQFLkmNssAlqVG9CzzJWJIfJbm3274iyaEkj3b3l48upiTpbOczAp8Bjq3Yvg04XFW7gcPdtiRpk7yyz4uSXA28F/hz4E+73TcB7+weHwTuB/5sY+NJbVhYWOC/nxnjjge3Dx1FW8jjz4zxmoWFkb1/3xH4l4FPAi+u2Lejqk4AdPdXrXZgkn1J5pLMLS4uXkhWSdIK647Ak7wPOFVVR5K883w/oKoOAAcApqam6nyPl1owMTHB8y+c4Pa3Pz10FG0hdzy4nVdNTIzs/ftMoVwHvD/JjcClwPYkXwdOJtlZVSeS7AROjSylJOkc606hVNWnqurqqtoFfBj4TlV9FLgH2Nu9bC9w98hSSpLOcSHrwD8HvCfJo8B7um1J0ibptQplWVXdz9JqE6rqSeD6jY8kSerDMzElqVEWuCQ1ygKXpEZZ4JLUKAtckhplgUtSoyxwSWqUBS5JjbLAJalRFrgkNcoCl6RGWeCS1CgLXJIaZYFLUqMscElqlAUuSY2ywCWpURa4JDXKApekRq1b4EkuTfKDJD9O8kiSz3T7P51kIcnR7nbj6ONKkpb1uajx88C7qurZJJcA30vyj91zX6qqz48uniRpLesWeFUV8Gy3eUl3q1GG2spmZ2eZn58fOgYLCwsATExMDJpjcnKS6enpQTNIF6tec+BJxpIcBU4Bh6rqge6pW5M8lOTOJJevcey+JHNJ5hYXFzcmtTh9+jSnT58eOoakAfWZQqGqzgB7krwO+HaStwJfAT7L0mj8s8AXgD9e5dgDwAGAqamp5kfuW2W0OTMzA8D+/fsHTiJpKOe1CqWqfgncD9xQVSer6kxVvQh8Fbh24+NJktbSZxXKeDfyJsk24N3A8SQ7V7zsg8DDI0koSVpVnymUncDBJGMsFf5dVXVvkr9JsoelKZTHgI+NLKUk6Rx9VqE8BFyzyv6bR5JIktRLr//ElLS+J54d444Htw8dY3Ann1uamd3x6hcHTjK8J54dY/cI398ClzbA5OTk0BG2jF9150m86o3+TXYz2u+GBS5tgK2yvHQrcInr5vHHrCSpUU2NwLfKaexbwfLfYXm0c7HzlH5djJoq8Pn5eY4+fIwzr75i6CiDe8Wvlk5qPfLTkwMnGd7Yc08NHUEaRFMFDnDm1Vdw+s3+cq3+37bj9w0dQRqEc+CS1CgLXJIaZYFLUqMscElqlAUuSY2ywCWpURa4JDXKApekRlngktQoC1ySGmWBS1KjLHBJalSfq9JfmuQHSX6c5JEkn+n2X5HkUJJHu/vLRx9XkrSszwj8eeBdVfU2YA9wQ5J3ALcBh6tqN3C425YkbZJ1C7yWPNttXtLdCrgJONjtPwh8YBQBJUmr6zUHnmQsyVHgFHCoqh4AdlTVCYDu/qo1jt2XZC7J3OLi4gbFliT1KvCqOlNVe4CrgWuTvLXvB1TVgaqaqqqp8fHxlxhTknS281qFUlW/BO4HbgBOJtkJ0N2f2uhwkqS19VmFMp7kdd3jbcC7gePAPcDe7mV7gbtHlFGStIo+18TcCRxMMsZS4d9VVfcm+T5wV5JbgCeAD40wpyTpLOsWeFU9BFyzyv4ngetHEUqStD7PxJSkRlngktQoC1ySGmWBS1Kj+qxC2TIWFhYYe+6/2Hb8vqGjaAsZe+5JFhZeGDqGtOkcgUtSo5oagU9MTPAfz7+S02++cego2kK2Hb+PiYkdQ8eQNp0jcElqlAUuSY2ywCWpUU3NgUv6zWZnZ5mfnx80w/Lnz8zMDJoDYHJykunp6aFjjIwFLmlDbdu2begIFw0LXHoZeTmPNnUu58AlqVEWuCQ1ygKXpEZZ4JLUKAtckhplgUtSo/pclf71Sb6b5FiSR5LMdPs/nWQhydHu5i9MSdIm6rMO/AXgE1X1YJLXAkeSHOqe+1JVfX508SRJa+lzVfoTwInu8TNJjgETow4mSfrNzmsOPMku4BrggW7XrUkeSnJnksvXOGZfkrkkc4uLixeWVpL0a70LPMllwDeBj1fV08BXgDcBe1gaoX9hteOq6kBVTVXV1Pj4+IUnliQBPQs8ySUslfc3qupbAFV1sqrOVNWLwFeBa0cXU5J0tj6rUAJ8DThWVV9csX/nipd9EHh44+NJktbSZxXKdcDNwE+SHO323Q58JMkeoIDHgI+NIJ8kaQ19VqF8D8gqT9238XEkSX15JqYkNcoCl6RGWeCS1CgLXJIaZYFLUqOau6jx2HNPse24C2Be8T9PA/DipdsHTjK8seeeAnYMHUPadE0V+OTk5NARtoz5+WcAmPwdiwt2+N3QRampAp+enh46wpYxMzMDwP79+wdOImkozoFLUqMscElqlAUuSY2ywCWpURa4JDXKApekRlngktQoC1ySGmWBS1KjLHBJapQFLkmN6nNV+tcn+W6SY0keSTLT7b8iyaEkj3b3l48+riRpWZ8R+AvAJ6rq94B3AH+S5C3AbcDhqtoNHO62JUmbZN0Cr6oTVfVg9/gZ4BgwAdwEHOxedhD4wIgySpJWcV5z4El2AdcADwA7quoELJU8cNWGp5Mkral3gSe5DPgm8PGqevo8jtuXZC7J3OLi4kvJKElaRa8CT3IJS+X9jar6Vrf7ZJKd3fM7gVOrHVtVB6pqqqqmxsfHNyKzJIl+q1ACfA04VlVfXPHUPcDe7vFe4O6NjydJWkufS6pdB9wM/CTJ0W7f7cDngLuS3AI8AXxoJAklSatat8Cr6ntA1nj6+o2NI0nqyzMxJalRFrgkNcoCl6RGWeCS1CgLXJIa1WcZoVaYnZ1lfn5+6Bi/zjAzMzNojsnJSaanpwfNIF2sLPBGbdu2begIkgZmgZ8nR5uStgrnwCWpURa4JDXKApekRlngktQoC1ySGmWBS1KjLHBJapQFLkmNSlVt3ocli8Djm/aBL39XAr8YOoS0Cr+bG+uNVXXORYU3tcC1sZLMVdXU0Dmks/nd3BxOoUhSoyxwSWqUBd62A0MHkNbgd3MTOAcuSY1yBC5JjbLAJalRFniDktyQ5N+SzCe5beg80rIkdyY5leThobNcDCzwxiQZA/4S+EPgLcBHkrxl2FTSr/01cMPQIS4WFnh7rgXmq+qnVfUr4O+BmwbOJAFQVf8KPDV0jouFBd6eCeDfV2z/rNsn6SJjgbcnq+xzLah0EbLA2/Mz4PUrtq8Gfj5QFkkDssDb80Ngd5LfTvJbwIeBewbOJGkAFnhjquoF4Fbgn4FjwF1V9ciwqaQlSf4O+D7wu0l+luSWoTO9nHkqvSQ1yhG4JDXKApekRlngktQoC1ySGmWBS1KjLHBJapQFLkmN+j85LUlnMKLdJQAAAABJRU5ErkJggg==\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"sns.boxplot(data=[age_group1, age_group2])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Question 4 (10 points)\n",
"\n",
"Visualize the distribution of age in the age groups using histograms. \n",
"\n",
"* **Option 1**. Make two (2) side-by-side plots, each containing a histogram. If you choose this option, please be sure your x-axis and y-axis are _identical_ between the two plots.\n",
"* **Option 2**. Make one (1) plot with two histograms. If you choose this option, be sure to make the two histogram in different colors and set appropriate transparency values such that the two histograms can be seen.\n",
"\n",
"Like **Question 3**, You MUST use either [matplotlib](https://matplotlib.org/) or [seaborn](https://seaborn.pydata.org/) library for this task."
]
},
{
"cell_type": "code",
"execution_count": 105,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(array([[13., 20., 20., 15., 32., 0., 0., 0., 0., 0.],\n",
" [ 0., 0., 6., 15., 14., 11., 13., 7., 14., 20.]]),\n",
" array([18. , 21.5, 25. , 28.5, 32. , 35.5, 39. , 42.5, 46. , 49.5, 53. ]),\n",
" <a list of 2 BarContainer objects>)"
]
},
"execution_count": 105,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXAAAAD4CAYAAAD1jb0+AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy86wFpkAAAACXBIWXMAAAsTAAALEwEAmpwYAAANgElEQVR4nO3df6jd9X3H8eerxmFRZyNeQ1CzO0TKpNBYLlkhULrallTL1DHHHHMZOOIfCsqELfOfpf+FUm3/GUKc0rDZFkFF0dI1OEsRxPbGpRpJi6VkooYkztUf/2yo7/1xv7e7vZ6Tc+6559x7PsnzAYfz/X7O93u/Lz6GV7753u/5mqpCktSej613AEnSaCxwSWqUBS5JjbLAJalRFrgkNWrDWh7soosuqtnZ2bU8pCQ17+DBg29W1czy8TUt8NnZWebn59fykJLUvCT/2WvcSyiS1CgLXJIaZYFLUqMscElqlAUuSY2ywCWpURa4JDXKApekRlngktSoNf0mpjSs2d1PrXifo3uvnUASaXp5Bi5JjbLAJalRFrgkNcoCl6RGWeCS1CgLXJIaZYFLUqMscElq1MACT3JOkp8k+VmSl5N8rRu/MMmBJK907xsnH1eStGiYM/D/Ab5QVZ8GtgI7knwW2A08XVVXAE9365KkNTKwwGvBe93q2d2rgOuA/d34fuD6SQSUJPU21DXwJGclOQScAA5U1fPApqo6BtC9XzyxlJKkjxiqwKvqg6raClwKbEvyqWEPkGRXkvkk8ydPnhwxpiRpuRXdhVJVvwZ+BOwAjifZDNC9n+izz76qmququZmZmdWllST9xjB3ocwk+US3/HHgi8DPgSeAnd1mO4HHJ5RRktTDMM8D3wzsT3IWC4X/cFU9meQ54OEktwCvAjdOMKckaZmBBV5VLwJX9Rj/L+DqSYSSJA3mNzElqVEWuCQ1ygKXpEZZ4JLUKAtckhplgUtSoyxwSWqUBS5JjbLAJalRFrgkNcoCl6RGWeCS1CgLXJIaZYFLUqMscElqlAUuSY2ywCWpURa4JDXKApekRlngktQoC1ySGmWBS1KjLHBJapQFLkmNGljgSS5L8kySI0leTnJHN74nyetJDnWvayYfV5K0aMMQ27wP3FVVLyQ5HziY5ED32Ter6huTiydJ6mdggVfVMeBYt/xukiPAJZMOJkk6tRVdA08yC1wFPN8N3Z7kxSQPJtnYZ59dSeaTzJ88eXJ1aSVJvzF0gSc5D3gEuLOq3gHuAy4HtrJwhn5Pr/2qal9VzVXV3MzMzOoTS5KAIQs8ydkslPdDVfUoQFUdr6oPqupD4H5g2+RiSpKWG+YulAAPAEeq6t4l45uXbHYDcHj88SRJ/QxzF8p24GbgpSSHurG7gZuSbAUKOArcOoF8kqQ+hrkL5VkgPT76/vjjSJKG5TcxJalRFrgkNcoCl6RGWeCS1CgLXJIaZYFLUqMscElqlAUuSY2ywCWpURa4JDXKApekRlngktQoC1ySGmWBS1KjLHBJapQFLkmNssAlqVEWuCQ1ygKXpEZZ4JLUKAtckhplgUtSoyxwSWrUwAJPclmSZ5IcSfJykju68QuTHEjySve+cfJxJUmLhjkDfx+4q6r+APgscFuSK4HdwNNVdQXwdLcuSVojAwu8qo5V1Qvd8rvAEeAS4Dpgf7fZfuD6CWWUJPWwYSUbJ5kFrgKeBzZV1TFYKPkkF/fZZxewC2DLli2rCrteZnc/teJ9ju69tvljS+pjzwUj7PP22GMM/UvMJOcBjwB3VtU7w+5XVfuqaq6q5mZmZkbJKEnqYagCT3I2C+X9UFU92g0fT7K5+3wzcGIyESVJvQxzF0qAB4AjVXXvko+eAHZ2yzuBx8cfT5LUzzDXwLcDNwMvJTnUjd0N7AUeTnIL8Cpw40QSSpJ6GljgVfUskD4fXz3eOJKkYflNTElqlAUuSY2ywCWpURa4JDXKApekRlngktQoC1ySGmWBS1KjLHBJapQFLkmNssAlqVEWuCQ1ygKXpEZZ4JLUKAtckhplgUtSoyxwSWqUBS5JjbLAJalRFrgkNcoCl6RGWeCS1CgLXJIaZYFLUqMGFniSB5OcSHJ4ydieJK8nOdS9rplsTEnScsOcgX8b2NFj/JtVtbV7fX+8sSRJgwws8Kr6MfDWGmSRJK3AhlXse3uSvwLmgbuq6r97bZRkF7ALYMuWLas4nNba7O6nVrzP0XP+YmU77Hl7xceQtGDUX2LeB1wObAWOAff027Cq9lXVXFXNzczMjHg4SdJyIxV4VR2vqg+q6kPgfmDbeGNJkgYZqcCTbF6yegNwuN+2kqTJGHgNPMl3gc8DFyV5DfhH4PNJtgIFHAVunVxESVIvAwu8qm7qMfzABLJIklZgNXehSNNlzwUr3N47YJrmf2+/Si9JrbLAJalRFrgkNcoCl6RGWeCS1CgLXJIa1cxthCM9WGnvtRNIIk0Zb6c7Y3kGLkmNssAlqVEWuCQ1ygKXpEZZ4JLUKAtckhplgUtSoyxwSWqUBS5JjbLAJalRFrgkNcoCl6RGNfMwK2mq+UAprQPPwCWpURa4JDVqYIEneTDJiSSHl4xdmORAkle6942TjSlJWm6YM/BvAzuWje0Gnq6qK4Cnu3VJ0hoaWOBV9WPgrWXD1wH7u+X9wPXjjSVJGmTUa+CbquoYQPd+cb8Nk+xKMp9k/uTJkyMeTpK03MR/iVlV+6pqrqrmZmZmJn04STpjjFrgx5NsBujeT4wvkiRpGKMW+BPAzm55J/D4eOJIkoY1zG2E3wWeAz6Z5LUktwB7gS8leQX4UrcuSVpDA79KX1U39fno6jFnkSStgN/ElKRGWeCS1CgLXJIaZYFLUqMscElqlAUuSY2ywCWpURa4JDXKApekRlngktQoC1ySGmWBS1KjLHBJapQFLkmNGvg4WUnqa88FK9z+7cnkOEN5Bi5JjbLAJalRFrgkNcoCl6RGWeCS1CjvQjkdeWeAdEbwDFySGmWBS1KjVnUJJclR4F3gA+D9qpobRyhJ0mDjuAb+R1X15hh+jiRpBbyEIkmNWm2BF/DDJAeT7Oq1QZJdSeaTzJ88eXKVh5MkLVptgW+vqs8AXwFuS/K55RtU1b6qmququZmZmVUeTpK0aFUFXlVvdO8ngMeAbeMIJUkabOQCT3JukvMXl4EvA4fHFUySdGqruQtlE/BYksWf852q+sFYUkmSBhq5wKvqV8Cnx5hFkrQC3kYoSY2ywCWpURa4JDXKApekRlngktQoC1ySGmWBS1KjLHBJapQFLkmNssAlqVEWuCQ1ygKXpEZZ4JLUKAtckhplgUtSoyxwSWqUBS5JjbLAJalRFrgkNcoCl6RGWeCS1CgLXJIaZYFLUqMscElq1KoKPMmOJL9I8ssku8cVSpI02MgFnuQs4J+ArwBXAjcluXJcwSRJp7aaM/BtwC+r6ldV9b/A94DrxhNLkjRIqmq0HZM/BXZU1d906zcDf1hVty/bbhewq1v9JPCL0eOO7CLgzXU47qhaywvtZTbv5LWWeZrz/l5VzSwf3LCKH5geYx/526Cq9gH7VnGcVUsyX1Vz65lhJVrLC+1lNu/ktZa5tbywuksorwGXLVm/FHhjdXEkScNaTYH/FLgiye8n+R3gz4EnxhNLkjTIyJdQqur9JLcD/wacBTxYVS+PLdl4reslnBG0lhfay2zeyWstc2t5R/8lpiRpfflNTElqlAUuSY06rQo8yWVJnklyJMnLSe7oxi9MciDJK937xvXOuugUmfckeT3Joe51zXpnBUhyTpKfJPlZl/dr3fhUzvEp8k7l/C5KclaS/0jyZLc+lfO7VI/MUzvHSY4meanLNd+NTf0cL3daXQNPshnYXFUvJDkfOAhcD/w18FZV7e2e2bKxqv5+/ZL+v1Nk/jPgvar6xnrmWy5JgHOr6r0kZwPPAncAf8IUzvEp8u5gCud3UZK/BeaA362qryb5OlM4v0v1yLyHKZ3jJEeBuap6c8nY1M/xcqfVGXhVHauqF7rld4EjwCUsfMV/f7fZfhYKciqcIvNUqgXvdatnd69iSuf4FHmnVpJLgWuBf14yPJXzu6hP5tZM9Rz3cloV+FJJZoGrgOeBTVV1DBYKE7h4HaP1tSwzwO1JXkzy4DT9c677p/Ih4ARwoKqmeo775IUpnV/gW8DfAR8uGZva+e18i49mhumd4wJ+mORg97gPmP45/ojTssCTnAc8AtxZVe+sd55h9Mh8H3A5sBU4Btyzful+W1V9UFVbWfj27bYkn1rnSKfUJ+9Uzm+SrwInqurgemcZ1ikyT+Ucd7ZX1WdYeJrqbUk+t96BRnHaFXh3nfMR4KGqerQbPt5da1685nxivfL10itzVR3viudD4H4Wnv44Varq18CPWLiePNVzDL+dd4rndzvwx9012u8BX0jyr0z3/PbMPMVzTFW90b2fAB5jIds0z3FPp1WBd7+wegA4UlX3LvnoCWBnt7wTeHyts/XTL/PiH6TODcDhtc7WS5KZJJ/olj8OfBH4OVM6x/3yTuv8VtU/VNWlVTXLwuMp/r2q/pIpnV/on3la5zjJud0NAyQ5F/gyC9mmdo77Wc3TCKfRduBm4KXumifA3cBe4OEktwCvAjeuT7ye+mW+KclWFq7VHQVuXY9wPWwG9mfhf+jxMeDhqnoyyXNM5xz3y/svUzq//Uzzn+F+vj6lc7wJeGzh3IkNwHeq6gdJfkpjc3xa3UYoSWeS0+oSiiSdSSxwSWqUBS5JjbLAJalRFrgkNcoCl6RGWeCS1Kj/A06/B2gWZlqzAAAAAElFTkSuQmCC\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.hist([age_group1, age_group2])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Question 5a (3 points)\n",
"\n",
"We simulated `age_group1` and `age_group2` using two different ranges.\n",
"\n",
"* Clearly, in this toy example, we are using two different distributions, and **we know** that their _mean_ should be _statistically different_.\n",
"* Importantly, we can safely assume that the data points are _independent_. Why? Because you have separately created `age_group1` and `age_group2`.\n",
"\n",
"**Your task:** Perform a [Student's t-test](https://en.wikipedia.org/wiki/Student%27s_t-test) to _formally_ determine whether the _arithmetic means_ of `age_group1` and `age_group2` are different. Report the test statistics and P-value.\n",
"\n",
"You are welcome to use any Python libraries for this task, but I recommend either [scipy.stats](https://docs.scipy.org/doc/scipy/reference/tutorial/stats.html) or [statsmodel](www.statsmodels.org) for this task."
]
},
{
"cell_type": "code",
"execution_count": 113,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"-13.141949195253934\n",
"8.950014556266476e-29\n"
]
}
],
"source": [
"# YOUR CODE HERE\n",
"# Choose any library you want\n",
"# Be sure to print out the test statistics & P-value\n",
"ttest = scipy.stats.ttest_ind(age_group1, age_group2)\n",
"\n",
"print(ttest.statistic)\n",
"print(ttest.pvalue)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Question 5b (2 point)\n",
"\n",
"In the **markdown cell** below, interpret the P-value you obtained, in plain English."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Enter your response below this line (double-click the cell to edit):**\n",
"\n",
"The P-value that we obtained is extremely small. A common P-value to use\n",
"is 0.05 and ours is many orders of magnitude smaller. That means that we\n",
"can clearly reject the \"null hypothesis\" that would have said that the\n",
"two data sets are only different by chance.\n",
"\n",
"Given that the two age ranges were chosen not to have a lot of overlap, this\n",
"is not surprising."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment