Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save S2Ahmad/f8c902fbccaca13393a68a9f2fed78e2 to your computer and use it in GitHub Desktop.
Save S2Ahmad/f8c902fbccaca13393a68a9f2fed78e2 to your computer and use it in GitHub Desktop.
{
"cells": [
{
"cell_type": "markdown",
"id": "3cbdbf5d",
"metadata": {
"id": "3efad4c7-d34f-44b0-a48d-94651f931bae"
},
"source": [
"# Activity: Hypothesis testing with Python"
]
},
{
"cell_type": "markdown",
"id": "0d7a0797",
"metadata": {
"id": "2faf7b57-5c13-45e5-b666-d575eff0d17c"
},
"source": [
"## **Introduction**\n"
]
},
{
"cell_type": "markdown",
"id": "3da37a19",
"metadata": {
"id": "2ca9aca5-33e0-4aa7-acdb-05832b05e5a9"
},
"source": [
"As you've been learning, analysis of variance (commonly called ANOVA) is a group of statistical techniques that test the difference of means among three or more groups. It's a powerful tool for determining whether population means are different across groups and for answering a wide range of business questions.\n",
"\n",
"In this activity, you are a data professional working with historical marketing promotion data. You will use the data to run a one-way ANOVA and a post hoc ANOVA test. Then, you will communicate your results to stakeholders. These experiences will help you make more confident recommendations in a professional setting. \n",
"\n",
"In your dataset, each row corresponds to an independent marketing promotion, where your business uses TV, social media, radio, and influencer promotions to increase sales. You have previously provided insights about how different promotion types affect sales; now stakeholders want to know if sales are significantly different among various TV and influencer promotion types.\n",
"\n",
"To address this request, a one-way ANOVA test will enable you to determine if there is a statistically significant difference in sales among groups. This includes:\n",
"* Using plots and descriptive statistics to select a categorical independent variable\n",
"* Creating and fitting a linear regression model with the selected categorical independent variable\n",
"* Checking model assumptions\n",
"* Performing and interpreting a one-way ANOVA test\n",
"* Comparing pairs of groups using an ANOVA post hoc test\n",
"* Interpreting model outputs and communicating the results to nontechnical stakeholders"
]
},
{
"cell_type": "markdown",
"id": "2418371c",
"metadata": {
"id": "bfcf5ec2-e48b-4443-9bf6-72670bd60041"
},
"source": [
"## **Step 1: Imports** \n"
]
},
{
"cell_type": "markdown",
"id": "b59ce966",
"metadata": {
"id": "7dcaa8a0-4fe8-4816-9ef5-5fc665a4638f"
},
"source": [
"Import pandas, pyplot from matplotlib, seaborn, api from statsmodels, ols from statsmodels.formula.api, and pairwise_tukeyhsd from statsmodels.stats.multicomp."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "965f75cd",
"metadata": {
"id": "b2f4b9fa-b7bd-4b88-8c71-d3af5ddcb906"
},
"outputs": [],
"source": [
"# Import libraries and packages.\n",
"\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"import statsmodels.api as sm\n",
"from statsmodels.formula.api import ols\n",
"from statsmodels.stats.multicomp import pairwise_tukeyhsd\n"
]
},
{
"cell_type": "markdown",
"id": "88732511",
"metadata": {},
"source": [
"`Pandas` was used to load the dataset `marketing_sales_data.csv` as `data`, now display the first five rows. The variables in the dataset have been adjusted to suit the objectives of this lab. As shown in this cell, the dataset has been automatically loaded in for you. You do not need to download the .csv file, or provide more code, in order to access the dataset and proceed with this lab. Please continue with this activity by completing the following instructions."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "de050e2b",
"metadata": {
"id": "32d46d82-2bd6-4433-b56e-cfa5542949ca"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>TV</th>\n",
" <th>Radio</th>\n",
" <th>Social Media</th>\n",
" <th>Influencer</th>\n",
" <th>Sales</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Low</td>\n",
" <td>1.218354</td>\n",
" <td>1.270444</td>\n",
" <td>Micro</td>\n",
" <td>90.054222</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Medium</td>\n",
" <td>14.949791</td>\n",
" <td>0.274451</td>\n",
" <td>Macro</td>\n",
" <td>222.741668</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Low</td>\n",
" <td>10.377258</td>\n",
" <td>0.061984</td>\n",
" <td>Mega</td>\n",
" <td>102.774790</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>High</td>\n",
" <td>26.469274</td>\n",
" <td>7.070945</td>\n",
" <td>Micro</td>\n",
" <td>328.239378</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>High</td>\n",
" <td>36.876302</td>\n",
" <td>7.618605</td>\n",
" <td>Mega</td>\n",
" <td>351.807328</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" TV Radio Social Media Influencer Sales\n",
"0 Low 1.218354 1.270444 Micro 90.054222\n",
"1 Medium 14.949791 0.274451 Macro 222.741668\n",
"2 Low 10.377258 0.061984 Mega 102.774790\n",
"3 High 26.469274 7.070945 Micro 328.239378\n",
"4 High 36.876302 7.618605 Mega 351.807328"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# RUN THIS CELL TO IMPORT YOUR DATA.\n",
"\n",
"### YOUR CODE HERE ### \n",
"data = pd.read_csv('marketing_sales_data.csv')\n",
"\n",
"# Display the first five rows.\n",
"\n",
"data.head()\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "28dcb9ec",
"metadata": {
"id": "c179e85f-20df-4840-ad29-de35b928dff6"
},
"source": [
"The features in the data are:\n",
"* TV promotion budget (in Low, Medium, and High categories)\n",
"* Social media promotion budget (in millions of dollars)\n",
"* Radio promotion budget (in millions of dollars)\n",
"* Sales (in millions of dollars)\n",
"* Influencer size (in Mega, Macro, Nano, and Micro categories)"
]
},
{
"cell_type": "markdown",
"id": "33878d5f",
"metadata": {
"id": "6db7b19a-dd9b-490a-b389-0c433ed16754"
},
"source": [
"**Question:** Why is it useful to perform exploratory data analysis before constructing a linear regression model?\n",
"\n",
"Potential reasons include:\n",
"\n",
"* To understand which variables are present in the data\n",
"* To consider the distribution of features, such as minimum, mean, and maximum values\n",
"* To plot the relationship between the independent and dependent variables and visualize which features have a linear relationship\n",
"* To identify issues with the data, such as incorrect or missing values."
]
},
{
"cell_type": "markdown",
"id": "d73ca0dc",
"metadata": {
"id": "fd47ede7-63ff-4fe5-aeb0-b8f909e9ecbe"
},
"source": [
"## **Step 2: Data exploration** \n"
]
},
{
"cell_type": "markdown",
"id": "ed6404d8",
"metadata": {
"id": "b9669d71-a6b3-491b-b115-0c766625fc3d"
},
"source": [
"First, use a boxplot to determine how `Sales` vary based on the `TV` promotion budget category."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "eb3d3b16",
"metadata": {
"id": "518254a6-44d5-45bf-9b57-13ce3a4deab3"
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f1dbab77d10>"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEGCAYAAACKB4k+AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+j8jraAAAVuElEQVR4nO3df7BfdX3n8ecrkTVBdAEJNHAT4njDH+DWuHtlbBldLW1h0S66u65xdhVbZrA72OhsuyN0d8af6bitwu6dHX/ElZp2Cphddckw9gfSUrRVMfwQCIi5FYQLaRIRFCRGk7z3j+/J8Utyk9yQe77fe3Ofj5k753w/53xO3uGQ+/qeX5+TqkKSJIAFwy5AkjR7GAqSpJahIElqGQqSpJahIElqPW/YBRyNU045pVasWDHsMiRpTrn99tu/X1VLplo2p0NhxYoVbNq0adhlSNKckuR7B1vm6SNJUstQkCS1DAVJUstQkCS1DAVJUstQkCS1DAVJUmtOP6cgaf4aHx9nYmJixrc7OTkJwMjIyIxvG2B0dJQ1a9Z0su2ZYChIUp+dO3cOu4ShMhQkzUldfdvet93x8fFOtj/bdXZNIcmiJLcl+VaSzUk+0LS/P8mjSe5qfi7q63NlkokkDyS5oKvaJElT6/JIYRfwK1X1dJLjgK8m+fNm2dVV9dH+lZOcDawGzgFOB76c5Kyq2tNhjZKkPp0dKVTP083H45qfQ70Q+mLg+qraVVUPAhPAuV3VJ0k6UKe3pCZZmOQuYDtwU1V9o1n0riR3J7kmyUlN2xnAI33dJ5u2/bd5WZJNSTbt2LGjy/Ilad7pNBSqak9VrQJGgHOTvAz4BPBSYBWwFfhYs3qm2sQU21xXVWNVNbZkyZTDgUuSnqOBPLxWVU8CtwAXVtW2Jiz2Ap/m56eIJoFlfd1GgMcGUZ8kqafLu4+WJDmxmV8M/Crw7SRL+1Z7E3BvM78RWJ3k+UleAqwEbuuqPknSgbq8+2gpsD7JQnrhs6Gqbkzyp0lW0Ts19BDwToCq2pxkA3AfsBu43DuPJGmwOguFqrobeMUU7W87RJ+1wNquapIkHZoD4kmSWoaCJKllKEiSWoaCJKllKEiSWoaCJKllKEiSWr5kR1KnunptZle2bNkCdPcSn67M1Gs+DQVJnZqYmODOzXfCicOuZJr29iZ3PnrncOs4Ek/O3KYMBUndOxH2vnbvsKs4Zi24ZeauBHhNQZLUMhQkSS1DQZLUMhQkSS1DQZLUMhQkSS1DQZLUMhQkSS1DQZLUMhQkSS1DQZLU6iwUkixKcluSbyXZnOQDTfvJSW5KsqWZntTX58okE0keSHJBV7VJkqbW5ZHCLuBXqurlwCrgwiSvAq4Abq6qlcDNzWeSnA2sBs4BLgQ+nmRhh/VJkvbTWShUz9PNx+OanwIuBtY37euBNzbzFwPXV9WuqnoQmADO7ao+SdKBOr2mkGRhkruA7cBNVfUN4LSq2grQTE9tVj8DeKSv+2TTJkkakE7fp1BVe4BVSU4EvpjkZYdYPVNt4oCVksuAywCWL18+I3VK6s7k5CT8cGbH/Nd+noTJmpyRTQ1kL1XVk8At9K4VbEuyFKCZbm9WmwSW9XUbAR6bYlvrqmqsqsaWLFnSad2SNN90dqSQZAnws6p6Msli4FeB/w5sBC4BPtJMb2i6bASuTXIVcDqwEritq/okDcbIyAg7ssM3r3VowS0LGDljZEa21eXpo6XA+uYOogXAhqq6McnXgA1JLgUeBt4MUFWbk2wA7gN2A5c3p58kSQPSWShU1d3AK6Zofxw4/yB91gJru6pJknRoXvmRJLUMBUlSy1CQJLUMBUlSy1CQJLUMBUlSy1CQJLUMBUlSy1CQJLUMBUlSq9OhsyUJgCfn0NDZ+14NdsJQqzgyTzJjb58xFCR1anR0dNglHJEtW7YAsPKMlUOu5AicMXP/nQ0FSZ1as2bNsEs4IvvqHR8fH3IlwzFHjuckSYNgKEiSWoaCJKllKEiSWoaCJKllKEiSWoaCJKllKEiSWoaCJKnVWSgkWZbkb5Lcn2Rzknc37e9P8miSu5qfi/r6XJlkIskDSS7oqjZJ0tS6HOZiN/C7VXVHkhcCtye5qVl2dVV9tH/lJGcDq4FzgNOBLyc5q6r2dFij5rHx8XEmJiY62fbk5CQAIyMjM77t0dHROTd0hOaOzkKhqrYCW5v5p5Lcz6HH8bsYuL6qdgEPJpkAzgW+1lWNUld27tw57BKOeV2F+r4B8boK3tke6gMZEC/JCuAVwDeA84B3JXk7sIne0cQT9ALj633dJpkiRJJcBlwGsHz58k7r1rGty3+Y831Qtbls8eLFwy5hqDoPhSQnAJ8H3lNVP0ryCeBDQDXTjwG/BWSK7nVAQ9U6YB3A2NjYAcslzQ+z+dv2XNbp3UdJjqMXCH9WVV8AqKptVbWnqvYCn6Z3igh6RwbL+rqPAI91WZ8k6dm6vPsowGeA+6vqqr72pX2rvQm4t5nfCKxO8vwkLwFWArd1VZ8k6UBdnj46D3gbcE+Su5q23wfemmQVvVNDDwHvBKiqzUk2APfRu3Ppcu88kqTB6vLuo68y9XWCLx2iz1pgbVc1SZIOzSeaJUktQ0GS1DIUJEktQ0GS1DIUJEktQ0GS1DIUJEmtgQyIJx2NLoe47krXI212YbaP3qnBMBQ0601MTPCde+9g+Qlz5wH3f/Kz3kH4Tx765pArmZ6Hn1447BI0SxgKmhOWn7CH/zb29LDLOGZ9eNMJwy5Bs4TXFCRJLUNBktQyFCRJLUNBktQyFCRJLUNBktQ64lBIsiDJi7ooRpI0XNMKhSTXJnlRkhfQe13mA0n+S7elSZIGbbpHCmdX1Y+AN9J7neZyeu9fliQdQ6YbCsclOY5eKNxQVT8DqruyJEnDMN1hLj4FPAR8C7g1yZnAj7oqSuo3OTnJj59a6FAMHfreUwt5weTksMvQLDCtI4WqGq+qM6rqour5HvC6Q/VJsizJ3yS5P8nmJO9u2k9OclOSLc30pL4+VyaZSPJAkguO6m8mSTpi0zpSSHIa8AfA6VX1r5KcDfwS8JlDdNsN/G5V3ZHkhcDtSW4C3gHcXFUfSXIFcAXw3mabq4FzgNOBLyc5q6rmztCY6sTIyAg/2b3VAfE69OFNJ7BoZGTYZWgWmO41hc8Cf0nvlzXAd4D3HKpDVW2tqjua+aeA+4EzgIuB9c1q6+ldp6Bpv76qdlXVg8AEcO4065MkzYDphsIpVbUB2AtQVbuBaX+DT7ICeAXwDeC0qtrabGcrcGqz2hnAI33dJpu2/bd1WZJNSTbt2LFjuiVIkqZhuqHw4yQvprnjKMmrgB9Op2OSE4DPA+9pbms96KpTtB1wh1NVrauqsaoaW7JkyXRKkCRN03TvPvrPwEbgpUn+DlgC/LvDdWpuY/088GdV9YWmeVuSpVW1NclSYHvTPgks6+s+Ajw2zfokSTNguncf3QH8S+CXgXcC51TV3YfqkyT0LkTfX1VX9S3aCFzSzF8C3NDXvjrJ85O8BFgJ3Dbdv4gk6egd8kghyb85yKKzktD37X8q59F76vmeJHc1bb8PfATYkORS4GHgzQBVtTnJBnrDaOwGLvfOI0karMOdPvqNQywr4KChUFVfZerrBADnH6TPWmDtYWqSJHXkkKFQVb85qEIkScM33QvNJHk9vQfLFu1rq6oPdlGUJGk4pjt09ieBtwC/Q++U0JuBMzusS5I0BNN9TuGXq+rtwBNV9QF6Q1wsO0wfSdIcM91Q2NlMn0lyOr27g17STUmSpGGZ7jWFG5OcCPwhcHvT9r+7KUmSNCyHe07hlcAjVfWh5vMJwD3At4Gruy9PkjRIhzt99CngpwBJXkPvwbNP0Rv3aF23pUmSBu1wp48WVtUPmvm3AOuq6vPA5/ueUpYkHSMOd6SwMMm+4Dgf+Ou+ZdN+xkGSNDcc7hf7dcDfJvk+vTuQvgKQZJRpDp0tSZo7DjfMxdokNwNLgb+qqn3vN1hA70E2SdIx5LCngKrq61O0faebciRJw+R1Ac0JDz+9kA9vOmHYZUzbtmd6l+tOO37vkCuZnoefXshZwy5Cs4KhoFlvdHR02CUcsZ9u2QLAohUrh1zJ9JzF3PzvrJlnKGjWW7NmzbBLOGL7ah4fHx9yJdKRme7YR5KkecBQkCS1DAVJUstQkCS1DAVJUquzUEhyTZLtSe7ta3t/kkeT3NX8XNS37MokE0keSHJBV3VJkg6uyyOFzwIXTtF+dVWtan6+BJDkbGA1cE7T5+NJFnZYmyRpCp2FQlXdCvzgsCv2XAxcX1W7qupBYAI4t6vaJElTG8Y1hXclubs5vXRS03YG8EjfOpNN2wGSXJZkU5JNO3bs6LpWSZpXBv1E8yeADwHVTD8G/BaQKdatKdqoqnU0b30bGxubcp1BGh8fZ2JiYsa3Ozk5CcDIyMiMb3t0dHROPiUsqXsDDYWq2rZvPsmngRubj5PAsr5VR4DHBljarLNz585hlyBpHhpoKCRZWlVbm49vAvbdmbQRuDbJVcDpwErgtkHW9lx19Y3bsXMkDUNnoZDkOuC1wClJJoH3Aa9NsoreqaGHgHcCVNXmJBuA+4DdwOVVtaer2iTo7tQfwJZmlNQuvjR4+k9d6iwUquqtUzR/5hDrrwXWdlWPNEiLFy8edgnSc+LQ2Zq3/LYtHchhLiRJLUNBktQyFCRJLUNBktQyFCRJLUNBktQyFCRJLUNBktQyFCRJLUNBktSaF8NcdDnwWVe6HFCtSw7WJs1t8yIUJiYmuPOe+9h7/MnDLmXa8tPe+4Nu/4d/HHIl07fgmem+fVXSbDUvQgFg7/En85Oz3zDsMo5pi+678fArSZrVvKYgSWoZCpKklqEgSWoZCpKklqEgSWoZCpKklqEgSWp1FgpJrkmyPcm9fW0nJ7kpyZZmelLfsiuTTCR5IMkFXdUlSTq4Lo8UPgtcuF/bFcDNVbUSuLn5TJKzgdXAOU2fjydZ2GFtkqQpdBYKVXUrsP+4BxcD65v59cAb+9qvr6pdVfUgMAGc21VtkqSpDfqawmlVtRWgmZ7atJ8BPNK33mTTdoAklyXZlGTTjh07Oi1Wkuab2XKhOVO01VQrVtW6qhqrqrElS5Z0XJYkzS+DDoVtSZYCNNPtTfsksKxvvRHgsQHXJknz3qBDYSNwSTN/CXBDX/vqJM9P8hJgJXDbgGuTpHmvs6Gzk1wHvBY4Jckk8D7gI8CGJJcCDwNvBqiqzUk2APcBu4HLq2rPTNUyOTnJgmd+6NDOHVvwzONMTu4edhmSjkJnoVBVbz3IovMPsv5aYG1X9UiSDm9evGRnZGSEbbue50t2OrbovhsZGfmFYZch6SjMlruPJEmzgKEgSWoZCpKklqEgSWoZCpKklqEgSWoZCpKklqEgSWoZCpKklqEgSWoZCpKklqEgSWoZCpKklqEgSWoZCpKk1rx4nwLAgmd+MKfevJaf/AiAWvSiIVcyfQue+QHg+xSkuWxehMLo6OiwSzhiW7Y8BcDKl86lX7K/MCf/W0v6uXkRCmvWrBl2CUdsX83j4+NDrkTSfOI1BUlSy1CQJLWGcvooyUPAU8AeYHdVjSU5GfgcsAJ4CPj3VfXEMOqTpPlqmEcKr6uqVVU11ny+Ari5qlYCNzefJUkDNJtOH10MrG/m1wNvHGItkjQvDSsUCvirJLcnuaxpO62qtgI001OHVJskzVvDuiX1vKp6LMmpwE1Jvj3djk2IXAawfPnyruqTpHlpKEcKVfVYM90OfBE4F9iWZClAM91+kL7rqmqsqsaWLFkyqJIlaV4YeCgkeUGSF+6bB34duBfYCFzSrHYJcMOga5Ok+W4Yp49OA76YZN+ff21V/UWSbwIbklwKPAy8eQi1SdK8NvBQqKrvAi+fov1x4PxB1yNJ+rnZdEuqJGnIDAVJUstQkCS1DAVJUstQkCS15sVLdro0Pj7OxMTEjG93y5YtQDcvCBodHZ2TLx6S1D1DYZZavHjxsEuQNA8ZCkfJb9ySjiVeU5AktQwFSVLLUJAktQwFSVLLUJAktQwFSVLLUJAktQwFSVIrVTXsGp6zJDuA7w27jg6dAnx/2EXoOXP/zV3H+r47s6qmfMn9nA6FY12STVU1Nuw69Ny4/+au+bzvPH0kSWoZCpKklqEwu60bdgE6Ku6/uWve7juvKUiSWh4pSJJahoIkqWUoDEmSp4ddgw6UpJL8ad/n5yXZkeTGI9zOLUnGmvkvJTlxpmvVkdv/312SdyT5X838byd5+2H6t+sfq3zzmvRsPwZelmRxVe0Efg149Gg2WFUXzUhl6lRVfXLYNcwGHinMIklWJfl6kruTfDHJSUlOTXJ7s/zlzTfZ5c3nf0hy/HCrPib9OfD6Zv6twHX7FiR5QZJrknwzyZ1JLm7aFye5vtl3nwMW9/V5KMkpSVYkubev/feSvL+ZvyXJ1UluTXJ/klcm+UKSLUk+PIC/87yX5P1Jfq+Zf2WzL7+W5I/69xtwepK/aPbNHw6p3M4YCrPLnwDvrapfBO4B3ldV24FFSV4EvBrYBLw6yZnA9qp6ZnjlHrOuB1YnWQT8IvCNvmX/Ffjrqnol8Drgj5K8APhPwDPNvlsL/Ivn8Of+tKpeA3wSuAG4HHgZ8I4kL37Ofxv1W5zkrn0/wAcPst4fA79dVb8E7Nlv2SrgLcA/A96SZFl35Q6ep49miST/FDixqv62aVoP/J9m/u+B84DXAH8AXAgE+Mqg65wPquruJCvoHSV8ab/Fvw78633fKIFFwHJ6+2a8r//dz+GP3thM7wE2V9VWgCTfBZYBjz+HberZdlbVqn0fkrwDeNZwFs31nxdW1d83TdcCb+hb5eaq+mGz7n3AmcAjXRY9SIbC3PAVekcJZ9L7BvleoIAjuvipI7IR+CjwWqD/W3qAf1tVD/SvnAR6++RQdvPso/NF+y3f1Uz39s3v++y/1cHJYZb375s9HGP7xtNHs0TzzeOJJK9umt4G7DtquBX4j8CWqtoL/AC4CPi7gRc6f1wDfLCq7tmv/S+B30mTAkle0bTfCvyHpu1l9E477W8bcGqSFyd5Ps/+9qlZoqqeAJ5K8qqmafUw6xm0Yyrh5pjjk0z2fb4KuAT4ZHPx+LvAbwJU1UPN76Bbm3W/Cow0//OqA1U1CfzPKRZ9CPgfwN1NMDxE75f7J4A/bk4b3QXcNsU2f5bkg/SuUTwIfLub6jUDLgU+neTHwC3AD4dbzuA4zIUk7SfJCVX1dDN/BbC0qt495LIGwiMFSTrQ65NcSe935PeAdwy3nMHxSEGS1PJCsySpZShIklqGgiSpZShIR6F55mDfsAn/mOTRvs8X7Lfue5J8fFi1StNhKEhHoaoer6pVzdAJnwSubuY/wYEPPa2mb3A9aTYyFKRu/F/gDc2TyzRjKZ1O78FDadYyFKQOVNXj9J5qvrBpWg18rrwHXLOcoSB15zp+fgrJU0eaEwwFqTv/Dzg/yT8HFlfVHcMuSDocQ0HqSDN2zi30Rlz1KEFzgqEgdes64OX03uYmzXqOfSRJanmkIElqGQqSpJahIElqGQqSpJahIElqGQqSpJahIElq/X92ceWavfXc6QAAAABJRU5ErkJggg==\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# Create a boxplot with TV and Sales.\n",
"\n",
"sns.boxplot(data=data, x='TV', y='Sales')\n"
]
},
{
"cell_type": "markdown",
"id": "3e8568ce",
"metadata": {
"id": "dd7d4c26-24ae-43b6-a521-18ce36446216"
},
"source": [
"<details>\n",
"<summary><h4><strong>Hint 1</strong></h4></summary>\n",
"\n",
"There is a function in the `seaborn` library that creates a boxplot showing the distribution of a variable across multiple groups.\n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"id": "72419e34",
"metadata": {
"id": "344a4a0a-1b9e-474a-979a-d55032c5bd75"
},
"source": [
"<details>\n",
"<summary><h4><strong>Hint 2</strong></h4></summary>\n",
"\n",
"Use the `boxplot()` function from `seaborn`.\n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"id": "58620b78",
"metadata": {
"id": "0b172c8a-8c94-4f83-bd33-b89d634a5025",
"jp-MarkdownHeadingCollapsed": true,
"tags": []
},
"source": [
"<details>\n",
"<summary><h4><strong>Hint 3</strong></h4></summary>\n",
"\n",
"Use `TV` as the `x` argument, `Sales` as the `y` argument, and `data` as the `data` argument.\n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"id": "42a2aa56",
"metadata": {
"id": "a956add8-97b0-45b8-a008-ca1f7033c308"
},
"source": [
"**Question:** Is there variation in `Sales` based off the `TV` promotion budget?\n",
"\n",
"There is considerable variation in `Sales` across the `TV` groups. The significance of these differences can be tested with a one-way ANOVA."
]
},
{
"cell_type": "markdown",
"id": "7a5814bd",
"metadata": {
"id": "f3100abe-32db-4a56-b831-18eb0857b2d7"
},
"source": [
"Now, use a boxplot to determine how `Sales` vary based on the `Influencer` size category."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "17c75147",
"metadata": {
"id": "fafbc9e4-de0b-4892-a863-add240208344"
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f1db8ad6290>"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# Create a boxplot with Influencer and Sales.\n",
"\n",
"sns.boxplot(data=data, x='Influencer', y='Sales')\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "b13c5314",
"metadata": {
"id": "e415aa95-0650-47e0-9efd-2770b8dfcb3d"
},
"source": [
"**Question:** Is there variation in `Sales` based off the `Influencer` size?\n",
"\n",
"There is some variation in `Sales` across the `Influencer` groups, but it may not be significant."
]
},
{
"cell_type": "markdown",
"id": "6506de8d",
"metadata": {
"id": "0f4adbee-9d13-400a-99e6-6d4c482b8e17"
},
"source": [
"### Remove missing data\n",
"\n",
"You may recall from prior labs that this dataset contains rows with missing values. To correct this, drop these rows. Then, confirm the data contains no missing values."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "cd3028e8",
"metadata": {
"id": "498e546b-e90e-4b84-a7d7-12b3bb514c1d"
},
"outputs": [
{
"data": {
"text/plain": [
"TV 0\n",
"Radio 0\n",
"Social Media 0\n",
"Influencer 0\n",
"Sales 0\n",
"dtype: int64"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Drop rows that contain missing data and update the DataFrame.\n",
"\n",
"data = data.dropna(axis=0)\n",
"\n",
"\n",
"# Confirm the data contains no missing values.\n",
"\n",
"data.isnull().sum()\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "1da1fc73",
"metadata": {
"id": "e37d7507-1f3d-4432-912e-ced7feff4ac6"
},
"source": [
"<details>\n",
"<summary><h4><strong>Hint 1</strong></h4></summary>\n",
"\n",
"There is a `pandas` function that removes missing values.\n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"id": "295ea3ac",
"metadata": {
"id": "19cd01e1-9976-47f6-b25c-7b8ce2a05627"
},
"source": [
"<details>\n",
"<summary><h4><strong>Hint 2</strong></h4></summary>\n",
"\n",
"The `dropna()` function removes missing values from an object (e.g., DataFrame).\n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"id": "54bd3fb5",
"metadata": {
"id": "87a46eec-9d3e-4657-bf91-6b3bd02089f0"
},
"source": [
"<details>\n",
"<summary><h4><strong>Hint 3</strong></h4></summary>\n",
"\n",
"Verify the data is updated properly after the rows containing missing data are dropped.\n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"id": "d62cb320",
"metadata": {
"id": "888c90fa-d800-43e4-a692-9fdd576c9b9c"
},
"source": [
"## **Step 3: Model building** \n"
]
},
{
"cell_type": "markdown",
"id": "0ec6e0c6",
"metadata": {
"id": "9c906193-db62-4af0-83fa-dffafc347554"
},
"source": [
"Fit a linear regression model that predicts `Sales` using one of the independent categorical variables in `data`. Refer to your previous code for defining and fitting a linear regression model."
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "1385a4c4",
"metadata": {
"id": "76f99341-1ea0-4f1d-a2c7-54e56bd57e38"
},
"outputs": [
{
"data": {
"text/html": [
"<table class=\"simpletable\">\n",
"<caption>OLS Regression Results</caption>\n",
"<tr>\n",
" <th>Dep. Variable:</th> <td>Sales</td> <th> R-squared: </th> <td> 0.874</td> \n",
"</tr>\n",
"<tr>\n",
" <th>Model:</th> <td>OLS</td> <th> Adj. R-squared: </th> <td> 0.874</td> \n",
"</tr>\n",
"<tr>\n",
" <th>Method:</th> <td>Least Squares</td> <th> F-statistic: </th> <td> 1971.</td> \n",
"</tr>\n",
"<tr>\n",
" <th>Date:</th> <td>Tue, 05 Sep 2023</td> <th> Prob (F-statistic):</th> <td>8.81e-256</td>\n",
"</tr>\n",
"<tr>\n",
" <th>Time:</th> <td>23:38:58</td> <th> Log-Likelihood: </th> <td> -2778.9</td> \n",
"</tr>\n",
"<tr>\n",
" <th>No. Observations:</th> <td> 569</td> <th> AIC: </th> <td> 5564.</td> \n",
"</tr>\n",
"<tr>\n",
" <th>Df Residuals:</th> <td> 566</td> <th> BIC: </th> <td> 5577.</td> \n",
"</tr>\n",
"<tr>\n",
" <th>Df Model:</th> <td> 2</td> <th> </th> <td> </td> \n",
"</tr>\n",
"<tr>\n",
" <th>Covariance Type:</th> <td>nonrobust</td> <th> </th> <td> </td> \n",
"</tr>\n",
"</table>\n",
"<table class=\"simpletable\">\n",
"<tr>\n",
" <td></td> <th>coef</th> <th>std err</th> <th>t</th> <th>P>|t|</th> <th>[0.025</th> <th>0.975]</th> \n",
"</tr>\n",
"<tr>\n",
" <th>Intercept</th> <td> 300.5296</td> <td> 2.417</td> <td> 124.360</td> <td> 0.000</td> <td> 295.783</td> <td> 305.276</td>\n",
"</tr>\n",
"<tr>\n",
" <th>C(TV)[T.Low]</th> <td> -208.8133</td> <td> 3.329</td> <td> -62.720</td> <td> 0.000</td> <td> -215.353</td> <td> -202.274</td>\n",
"</tr>\n",
"<tr>\n",
" <th>C(TV)[T.Medium]</th> <td> -101.5061</td> <td> 3.325</td> <td> -30.526</td> <td> 0.000</td> <td> -108.038</td> <td> -94.975</td>\n",
"</tr>\n",
"</table>\n",
"<table class=\"simpletable\">\n",
"<tr>\n",
" <th>Omnibus:</th> <td>450.714</td> <th> Durbin-Watson: </th> <td> 2.002</td>\n",
"</tr>\n",
"<tr>\n",
" <th>Prob(Omnibus):</th> <td> 0.000</td> <th> Jarque-Bera (JB): </th> <td> 35.763</td>\n",
"</tr>\n",
"<tr>\n",
" <th>Skew:</th> <td>-0.044</td> <th> Prob(JB): </th> <td>1.71e-08</td>\n",
"</tr>\n",
"<tr>\n",
" <th>Kurtosis:</th> <td> 1.775</td> <th> Cond. No. </th> <td> 3.86</td>\n",
"</tr>\n",
"</table><br/><br/>Warnings:<br/>[1] Standard Errors assume that the covariance matrix of the errors is correctly specified."
],
"text/plain": [
"<class 'statsmodels.iolib.summary.Summary'>\n",
"\"\"\"\n",
" OLS Regression Results \n",
"==============================================================================\n",
"Dep. Variable: Sales R-squared: 0.874\n",
"Model: OLS Adj. R-squared: 0.874\n",
"Method: Least Squares F-statistic: 1971.\n",
"Date: Tue, 05 Sep 2023 Prob (F-statistic): 8.81e-256\n",
"Time: 23:38:58 Log-Likelihood: -2778.9\n",
"No. Observations: 569 AIC: 5564.\n",
"Df Residuals: 566 BIC: 5577.\n",
"Df Model: 2 \n",
"Covariance Type: nonrobust \n",
"===================================================================================\n",
" coef std err t P>|t| [0.025 0.975]\n",
"-----------------------------------------------------------------------------------\n",
"Intercept 300.5296 2.417 124.360 0.000 295.783 305.276\n",
"C(TV)[T.Low] -208.8133 3.329 -62.720 0.000 -215.353 -202.274\n",
"C(TV)[T.Medium] -101.5061 3.325 -30.526 0.000 -108.038 -94.975\n",
"==============================================================================\n",
"Omnibus: 450.714 Durbin-Watson: 2.002\n",
"Prob(Omnibus): 0.000 Jarque-Bera (JB): 35.763\n",
"Skew: -0.044 Prob(JB): 1.71e-08\n",
"Kurtosis: 1.775 Cond. No. 3.86\n",
"==============================================================================\n",
"\n",
"Warnings:\n",
"[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n",
"\"\"\""
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Define the OLS formula.\n",
"\n",
"formula = 'Sales ~ C(TV)'\n",
"\n",
"\n",
"# Create an OLS model.\n",
"\n",
"OLS = ols(data=data, formula = formula)\n",
"\n",
"\n",
"# Fit the model.\n",
"\n",
"model = OLS.fit()\n",
"\n",
"\n",
"# Save the results summary.\n",
"\n",
"summary = model.summary()\n",
"\n",
"\n",
"# Display the model results.\n",
"\n",
"summary\n"
]
},
{
"cell_type": "markdown",
"id": "efd4b28b",
"metadata": {
"id": "109e32f5-8193-4961-8245-6b6c09acfe3a",
"jp-MarkdownHeadingCollapsed": true,
"tags": []
},
"source": [
"<details>\n",
"<summary><h4><strong>Hint 1</strong></h4></summary>\n",
"\n",
"Refer to code you've written to fit linear regression models.\n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"id": "8fe68670",
"metadata": {
"id": "49424e08-3472-44f1-a892-63ed80517510"
},
"source": [
"<details>\n",
"<summary><h4><strong>Hint 2</strong></h4></summary>\n",
"\n",
"Use the `ols()` function from `statsmodels.formula.api`, which creates a model from a formula and DataFrame, to create an OLS model.\n",
"\n",
"</details>\n"
]
},
{
"cell_type": "markdown",
"id": "080d61f6",
"metadata": {
"id": "0ee5dead-ed62-45d5-ab24-d671d8c3dde4",
"jp-MarkdownHeadingCollapsed": true,
"tags": []
},
"source": [
"<details>\n",
"<summary><h4><strong>Hint 3</strong></h4></summary>\n",
"\n",
"Use `C()` around the variable name in the ols formula to indicate a variable is categorical.\n",
" \n",
"Be sure the variable string names exactly match the column names in `data`.\n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"id": "f6cdc0f2",
"metadata": {
"id": "1d889a8b-76f5-4f27-833f-a19af27ed8ca"
},
"source": [
"**Question:** Which categorical variable did you choose for the model? Why?\n",
"\n",
"* `TV` was selected as the preceding analysis showed a strong relationship between the `TV` promotion budget and the average `Sales`.\n",
"* `Influencer` was not selected because it did not show a strong relationship to `Sales` in the analysis."
]
},
{
"cell_type": "markdown",
"id": "3cda2c84",
"metadata": {
"id": "b4987bee-40ae-4513-95c3-1bec1acdbba9",
"tags": []
},
"source": [
"### Check model assumptions"
]
},
{
"cell_type": "markdown",
"id": "96291c1b",
"metadata": {
"id": "6854af88-7d67-4214-a7df-c6405b46bb47"
},
"source": [
"Now, check the four linear regression assumptions are upheld for your model."
]
},
{
"cell_type": "markdown",
"id": "8103640e",
"metadata": {
"id": "72eeb1c7-2f17-44fe-ac3f-71d555c2d81e",
"tags": []
},
"source": [
"**Question:** Is the linearity assumption met?\n",
"\n",
"Because your model does not have any continuous independent variables, the linearity assumption is not required. "
]
},
{
"cell_type": "markdown",
"id": "d9aefb95",
"metadata": {
"id": "feeb314a-bbbe-4e9a-8561-2f8af0cd172e"
},
"source": [
"The independent observation assumption states that each observation in the dataset is independent. As each marketing promotion (row) is independent from one another, the independence assumption is not violated."
]
},
{
"cell_type": "markdown",
"id": "493cf487",
"metadata": {
"id": "bcccf5c8-3325-4b1e-b491-f151bea5ab1c"
},
"source": [
"Next, verify that the normality assumption is upheld for the model."
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "bb5d7f1d",
"metadata": {
"id": "cce8f99b-33e2-4723-9266-4f009e7a15dd"
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAD4CAYAAADrRI2NAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+j8jraAAARdElEQVR4nO3de4wdZ33G8e+TmADlUuJm7VqOXRNhAoFCoJsUkhYBJjQUFKeIJKBCVzStKYUolKsDUqX+ZxVEQS2lscLFlBRiQlIbWi7GXCoEBJxwDQ4NTXNxY+wlBQFFIjL59Y8zKBt7ba/XO2f37Pv9SKuZeefMmd/rZJ+dfXfmPakqJEntOGG+C5AkDZfBL0mNMfglqTEGvyQ1xuCXpMYsme8CZuKUU06pNWvWzHcZkjRSbrzxxh9W1djB7b0Ff5LTgWumNJ0G/DXwga59DXA7cHFV/ehI77VmzRp27drVT6GStEgluWO69t6Geqrqe1V1ZlWdCfwO8HPgemAjsLOq1gI7u21J0pAMa4x/HfBfVXUHsB7Y0rVvAS4cUg2SJIYX/C8GPtStL6+qvQDdctmQapAkMYTgT3IScAHwkWM8bkOSXUl2TU5O9lOcJDVoGFf8zwNuqqp93fa+JCsAuuX+6Q6qqs1VNV5V42Njh/xRWpI0S8MI/pdw/zAPwHZgolufALYNoQZJUqfX4E/ya8B5wHVTmjcB5yW5tdu3qc8aJEkP1OsDXFX1c+A3Dmq7h8FdPpKkeeCUDZLUGIN/xK1ctZokQ/lauWr1fHdX0hwYibl6dHh377mLS6780lDOdc0rzhnKeST1yyt+SWqMwS9JjTH4JakxBr8kNcbgl6TGGPyS1BiDX5IaY/BLUmMMfklqjMEvSY0x+CWpMQa/JDXG4Jekxhj8kuac04UvbE7LLGnOOV34wuYVvyQ1xuCXOg5PqBUO9UgdhyfUCq/4JakxvQZ/kkcluTbJLUl2J3l6kqVJdiS5tVue3GcNkqQH6vuK/53AJ6vqccCTgd3ARmBnVa0FdnbbkqQh6S34kzwSeAbwHoCqureqfgysB7Z0L9sCXNhXDZKkQ/V5xX8aMAm8L8nXk1yV5GHA8qraC9Atl013cJINSXYl2TU5OdljmZLUlj6DfwnwVODdVfUU4P84hmGdqtpcVeNVNT42NtZXjZLUnD6Dfw+wp6pu6LavZfCDYF+SFQDdcn+PNUiSDtJb8FfVD4C7kpzeNa0DvgtsBya6tglgW181SJIO1fcDXJcBVyc5CbgNeDmDHzZbk1wK3Alc1HMNkqQpeg3+qvoGMD7NrnV9nleSdHg+uStJjTH4JakxBr8kNcbgl6TGGPyS1BiDX5IaY/BL8+GEJUP7tK9F/4lf/lseMz+BS5oP9x0Y2qd9wSL/xC//LY+ZV/yS1BiDX5IaY/BLUmMMfklqjME/x1auWj3UOwwk6Vh5V88cu3vPXd5hIGlB84pfkhrjFb/Ugu4hJwkMfqkNPuSkKRzqkaTGGPyS1BiDX5IaY/BLUmN6/eNuktuBnwK/BA5U1XiSpcA1wBrgduDiqvpRn3VIku43jCv+Z1XVmVU13m1vBHZW1VpgZ7ctSRqS+RjqWQ9s6da3ABfOQw2S1Ky+g7+ATye5McmGrm15Ve0F6JbLeq5BkjRF3w9wnVtVdydZBuxIcstMD+x+UGwAWL169D/qTJIWil6v+Kvq7m65H7geOBvYl2QFQLfcf5hjN1fVeFWNj42N9VmmJDWlt+BP8rAkj/jVOvBc4DvAdmCie9kEsK2vGiRJh+pzqGc5cH03MdQS4F+q6pNJvgZsTXIpcCdwUY81SJIO0lvwV9VtwJOnab8HWNfXeSVJR+aTu5LUGINfkhpj8EtSYwx+SWqMwS9JjTH4JakxBr8kNcbg18ydsIQkQ/tauco5mqQ+9D1JmxaT+w5wyZVfGtrprnnFOUM7l9QSr/glqTFe8Wvh6oaWJM0tg18Ll0NLUi8c6pGkxhj8ktQYg1+SGmPwS1JjDH5JaozBL0mNMfglqTEGvyQ1xuCXpMYY/JLUmN6DP8mJSb6e5OPd9tIkO5Lc2i1P7rsGSdL9hnHFfzmwe8r2RmBnVa0FdnbbkqQh6TX4k5wKPB+4akrzemBLt74FuLDPGiRJD9T3Ff87gDcC901pW15VewG65bLpDkyyIcmuJLsmJyd7LlOS2jGj4E9y7kzaDtr/AmB/Vd04m8KqanNVjVfV+NjY2GzeQpI0jZle8f/9DNumOhe4IMntwIeBZyf5ILAvyQqAbrl/hjVIkubAET+IJcnTgXOAsSSvnbLrkcCJRzq2qq4Aruje55nA66vqpUneCkwAm7rltllXL0k6Zkf7BK6TgId3r3vElPafAC+a5Tk3AVuTXArcCVw0y/eRJM3CEYO/qr4AfCHJ+6vqjtmepKo+D3y+W78HWDfb95IkHZ+Zfubug5NsBtZMPaaqnt1HUZKk/sw0+D8C/BOD+/F/2V85kqS+zTT4D1TVu3utRJI0FDO9nfNjSf4yyYpurp2lSZb2WpkkLUQnLCHJ0L5Wrlo9512Y6RX/RLd8w5S2Ak6b23IkaYG77wCXXPmloZ3umlecM+fvOaPgr6pHz/mZJUnzYkbBn+RPpmuvqg/MbTmSpL7NdKjnrCnrD2FwH/5NgMEvSSNmpkM9l03dTvLrwD/3UpEkqVeznZb558DauSxEkjQcMx3j/xiDu3hgMDnb44GtfRUlSerPTMf43zZl/QBwR1Xt6aEeSVLPZjTU003WdguDGTpPBu7tsyhJUn9m+glcFwNfZTCF8sXADUlmOy2zJGkezXSo5y3AWVW1HyDJGPAZ4Nq+CpMk9WOmd/Wc8KvQ79xzDMdKkhaQmV7xfzLJp4APdduXAP/eT0mSpD4d7TN3HwMsr6o3JHkh8HtAgC8DVw+hPknSHDvacM07gJ8CVNV1VfXaqvorBlf77+i7OEnS3Dta8K+pqm8d3FhVuxh8DKMkacQcLfgfcoR9D53LQiRJw3G04P9akj8/uDHJpcCNRzowyUOSfDXJN5PcnORvuvalSXYkubVbnjz78iVJx+pod/W8Brg+yR9zf9CPAycBf3SUY38BPLuqfpbkQcAXk3wCeCGws6o2JdkIbATeNOseSJKOyRGDv6r2AeckeRbwxK7536rqs0d746oq4Gfd5oO6rwLWA8/s2rcAn8fgl6Shmel8/J8DPnesb57kRAa/KTwGeFdV3ZBkeVXt7d53b5Jlhzl2A7ABYPXquf+wYUlqVa9P31bVL6vqTOBU4OwkTzzaMVOO3VxV41U1PjY21l+RktSYoUy7UFU/ZjCkcz6wL8kKgG65/wiHSpLmWG/Bn2QsyaO69YcCz2EwtfN2YKJ72QSwra8aAFauWk2SoX1J0kI307l6ZmMFsKUb5z8B2FpVH0/yZWBrd0vonQymeu7N3Xvu4pIrv9TnKR7gmlecM7RzSdJs9Bb83RO/T5mm/R5gXV/nlSQdmVMrS1JjDH5JaozBL0mNMfglqTEGvyQ1xuCXpMYY/JLUGINfkhpj8EtSYwx+SWqMwS9JjTH4JakxBr8kNcbgl6TGGPyS1BiDX5IaY/BLUmMMfklqjMEvSY0x+CWpMQa/JDXG4JekxvQW/ElWJflckt1Jbk5yede+NMmOJLd2y5P7qkGSdKg+r/gPAK+rqscDTwNeleQMYCOws6rWAju7bUnSkPQW/FW1t6pu6tZ/CuwGVgLrgS3dy7YAF/ZVgyTpUEMZ40+yBngKcAOwvKr2wuCHA7DsMMdsSLIrya7JyclhlClJTeg9+JM8HPgo8Jqq+slMj6uqzVU1XlXjY2Nj/RUoSY3pNfiTPIhB6F9dVdd1zfuSrOj2rwD291mDJOmB+ryrJ8B7gN1V9fYpu7YDE936BLCtrxokSYda0uN7nwu8DPh2km90bW8GNgFbk1wK3Alc1GMNkqSD9Bb8VfVFIIfZva6v80qSjswndyWpMQa/JDXG4Jekxhj8ktQYg1+SGmPwS1JjDH5JaozBL0mNMfglqTEGvyQ1xuCXpMYY/JLUGINfkhpj8EtSYwx+SWqMwS9JjTH4JakxBr8kNcbgl6TGGPyS1BiDX5Ia01vwJ3lvkv1JvjOlbWmSHUlu7ZYn93V+SdL0+rzifz9w/kFtG4GdVbUW2NltS5KGqLfgr6r/AP73oOb1wJZufQtwYV/nlyRNb9hj/Murai9At1x2uBcm2ZBkV5Jdk5OTQytQkha7BfvH3araXFXjVTU+NjY23+VI0qIx7ODfl2QFQLfcP+TzS1Lzhh3824GJbn0C2Dbk80tS8/q8nfNDwJeB05PsSXIpsAk4L8mtwHndtiRpiJb09cZV9ZLD7FrX1zklSUe3YP+4K0nqh8EvSY0x+CWpMQa/JDXG4Jekxhj8ktQYg1+SGmPwS1JjDH5JaozBL0mNMfglqTEGvyQ1xuCXpMYY/JLUGINfkhpj8EtSYwx+SWqMwS9JjTH4JakxBr8kNcbgl6TGGPyS1Jh5Cf4k5yf5XpLvJ9k4HzVIUquGHvxJTgTeBTwPOAN4SZIzhl2HJLVqPq74zwa+X1W3VdW9wIeB9fNQhyQ1KVU13BMmLwLOr6o/67ZfBvxuVb36oNdtADZ0m6cD3zuO054C/PA4jl9oFlN/FlNfYHH1ZzH1BRZXf2bal9+qqrGDG5fMfT1HlWnaDvnpU1Wbgc1zcsJkV1WNz8V7LQSLqT+LqS+wuPqzmPoCi6s/x9uX+Rjq2QOsmrJ9KnD3PNQhSU2aj+D/GrA2yaOTnAS8GNg+D3VIUpOGPtRTVQeSvBr4FHAi8N6qurnn087JkNECspj6s5j6AourP4upL7C4+nNcfRn6H3clSfPLJ3clqTEGvyQ1ZtEHf5LLuukhbk7yt1Par+imjPhekj+YzxqPRZLXJ6kkp0xpG7m+JHlrkluSfCvJ9UkeNWXfKPZnpKchSbIqyeeS7O6+Vy7v2pcm2ZHk1m558nzXOlNJTkzy9SQf77ZHsi9JHpXk2u77ZXeSpx9vXxZ18Cd5FoOngp9UVU8A3ta1n8HgbqInAOcD/9hNJbGgJVkFnAfcOaVtJPsC7ACeWFVPAv4TuAJGsz+LZBqSA8DrqurxwNOAV3V92AjsrKq1wM5ue1RcDuyesj2qfXkn8MmqehzwZAZ9Oq6+LOrgB14JbKqqXwBU1f6ufT3w4ar6RVX9N/B9BlNJLHR/B7yRBz7wNpJ9qapPV9WBbvMrDJ7ngNHsz8hPQ1JVe6vqpm79pwzCZSWDfmzpXrYFuHB+Kjw2SU4Fng9cNaV55PqS5JHAM4D3AFTVvVX1Y46zL4s9+B8L/H6SG5J8IclZXftK4K4pr9vTtS1YSS4A/qeqvnnQrpHryzT+FPhEtz6K/RnFmg8ryRrgKcANwPKq2guDHw7Asvmr7Ji8g8FF0n1T2kaxL6cBk8D7umGrq5I8jOPsy3xM2TCnknwG+M1pdr2FQf9OZvCr61nA1iSnMcNpI4btKH15M/Dc6Q6bpm3e+wJH7k9Vbete8xYGwwxX/+qwaV6/IPpzBKNY87SSPBz4KPCaqvpJMl3XFrYkLwD2V9WNSZ453/UcpyXAU4HLquqGJO9kDoaoRj74q+o5h9uX5JXAdTV4WOGrSe5jMLnRgpw24nB9SfLbwKOBb3bfiKcCNyU5mwXaFzjyfxuAJBPAC4B1df8DJQu2P0cwijUfIsmDGIT+1VV1Xde8L8mKqtqbZAWw//DvsGCcC1yQ5A+BhwCPTPJBRrMve4A9VXVDt30tg+A/rr4s9qGefwWeDZDkscBJDGa02w68OMmDkzwaWAt8dd6qPIqq+nZVLauqNVW1hsH/DE+tqh8wYn35lSTnA28CLqiqn0/ZNYr9GflpSDK4ongPsLuq3j5l13ZgolufALYNu7ZjVVVXVNWp3ffKi4HPVtVLGc2+/AC4K8npXdM64LscZ19G/or/KN4LvDfJd4B7gYnuyvLmJFsZ/AMeAF5VVb+cxzpnrapGtS//ADwY2NH9FvOVqvqLUezPPE1DMtfOBV4GfDvJN7q2NwObGAyRXsrgbrKL5qm+uTCqfbkMuLq7qLgNeDmDi/ZZ98UpGySpMYt9qEeSdBCDX5IaY/BLUmMMfklqjMEvSY0x+CWpMQa/JDXm/wEGIvt1NM9LkAAAAABJRU5ErkJggg==\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# Calculate the residuals.\n",
"\n",
"res = model.resid\n",
"\n",
"\n",
"# Create a histogram with the residuals. \n",
"\n",
"sns.histplot(res)\n",
"\n",
"\n",
"# Create a QQ plot of the residuals.\n",
"\n",
"sm.qqplot(res, line='s')\n",
"\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "786ce87d",
"metadata": {
"id": "39538404-e292-4564-b361-46353fc8e3f0"
},
"source": [
"<details>\n",
"<summary><h4><strong>Hint 1</strong></h4></summary>\n",
"\n",
"Access the residuals from the fit model object.\n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"id": "647ecac6",
"metadata": {
"id": "689dabc4-ad48-4c9e-976e-b70520801385"
},
"source": [
"<details>\n",
"<summary><h4><strong>Hint 2</strong></h4></summary>\n",
"\n",
"Use `model.resid` to get the residuals from a fit model called `model`.\n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"id": "e1a4342d",
"metadata": {
"id": "562f868e-45e4-464a-a47e-9f1ed735d6a4"
},
"source": [
"<details>\n",
"<summary><h4><strong>Hint 3</strong></h4></summary>\n",
"\n",
"For the histogram, pass the residuals as the first argument in the `seaborn` `histplot()` function.\n",
" \n",
"For the QQ-plot, pass the residuals as the first argument in the `statsmodels` `qqplot()` function.\n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"id": "f834d442",
"metadata": {
"id": "e41f4dd7-5501-43b7-b684-58bad8ff61fb"
},
"source": [
"**Question:** Is the normality assumption met?\n",
"\n",
"There is reasonable concern that the normality assumption is not met when `TV` is used as the independent variable predicting `Sales`. The normal q-q forms an 'S' that deviates off the red diagonal line, which is not desired behavior. \n",
"\n",
"However, for the purpose of the lab, continue assuming the normality assumption is met."
]
},
{
"cell_type": "markdown",
"id": "9317db1d",
"metadata": {
"id": "be83ac10-d1d0-4b94-88de-5de424528547"
},
"source": [
"Now, verify the constant variance (homoscedasticity) assumption is met for this model."
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "2ad318a0",
"metadata": {
"id": "efcd0325-b3a0-42d1-ad57-38f10800c35e"
},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# Create a scatter plot with the fitted values from the model and the residuals.\n",
"\n",
"sns.scatterplot(model.fittedvalues, res).axhline(0, color='red')\n",
"\n",
"\n",
"# Add a line at y = 0 to visualize the variance of residuals above and below 0.\n",
"\n",
"\n",
"plt.show()\n"
]
},
{
"cell_type": "markdown",
"id": "f806fa3d",
"metadata": {
"id": "662f104d-0977-498f-8159-501063f3c3fc"
},
"source": [
"<details>\n",
"<summary><h4><strong>Hint 1</strong></h4></summary>\n",
"\n",
"Access the fitted values from the model object fit earlier.\n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"id": "8b52c828",
"metadata": {
"id": "b0a3d26c-1e80-46e3-849a-d7f054cffb52",
"jp-MarkdownHeadingCollapsed": true,
"tags": []
},
"source": [
"<details>\n",
"<summary><h4><strong>Hint 2</strong></h4></summary>\n",
"\n",
"Use `model.fittedvalues` to get the fitted values from the fit model called `model`.\n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"id": "57c433df",
"metadata": {
"id": "5830cbfc-9204-42a6-b24f-3cf1334ff41e",
"jp-MarkdownHeadingCollapsed": true,
"tags": []
},
"source": [
"<details>\n",
"<summary><h4><strong>Hint 3</strong></h4></summary>\n",
"\n",
"\n",
"Call the `scatterplot()` function from the `seaborn` library and pass in the fitted values and residuals.\n",
" \n",
"Add a line to a figure using the `axline()` function.\n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"id": "32cadf66",
"metadata": {
"id": "8318f726-369c-446a-acad-85117e43459b"
},
"source": [
"**Question:** Is the constant variance (homoscedasticity) assumption met?\n",
"\n",
"The variance where there are fitted values is similarly distributed, validating that the constant variance assumption is met."
]
},
{
"cell_type": "markdown",
"id": "4b1ea7a0",
"metadata": {
"id": "84373d80-2129-4124-85fa-85871671004b"
},
"source": [
"## **Step 4: Results and evaluation** "
]
},
{
"cell_type": "markdown",
"id": "a3b6fb8c",
"metadata": {
"id": "30f5a3e8-a446-4a64-a0cb-4a512a367111"
},
"source": [
"First, display the OLS regression results."
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "4f3d79ee",
"metadata": {
"id": "7d0bfc27-05f7-4cfa-9aa1-8e2110eabd69"
},
"outputs": [
{
"data": {
"text/html": [
"<table class=\"simpletable\">\n",
"<caption>OLS Regression Results</caption>\n",
"<tr>\n",
" <th>Dep. Variable:</th> <td>Sales</td> <th> R-squared: </th> <td> 0.874</td> \n",
"</tr>\n",
"<tr>\n",
" <th>Model:</th> <td>OLS</td> <th> Adj. R-squared: </th> <td> 0.874</td> \n",
"</tr>\n",
"<tr>\n",
" <th>Method:</th> <td>Least Squares</td> <th> F-statistic: </th> <td> 1971.</td> \n",
"</tr>\n",
"<tr>\n",
" <th>Date:</th> <td>Tue, 05 Sep 2023</td> <th> Prob (F-statistic):</th> <td>8.81e-256</td>\n",
"</tr>\n",
"<tr>\n",
" <th>Time:</th> <td>23:38:58</td> <th> Log-Likelihood: </th> <td> -2778.9</td> \n",
"</tr>\n",
"<tr>\n",
" <th>No. Observations:</th> <td> 569</td> <th> AIC: </th> <td> 5564.</td> \n",
"</tr>\n",
"<tr>\n",
" <th>Df Residuals:</th> <td> 566</td> <th> BIC: </th> <td> 5577.</td> \n",
"</tr>\n",
"<tr>\n",
" <th>Df Model:</th> <td> 2</td> <th> </th> <td> </td> \n",
"</tr>\n",
"<tr>\n",
" <th>Covariance Type:</th> <td>nonrobust</td> <th> </th> <td> </td> \n",
"</tr>\n",
"</table>\n",
"<table class=\"simpletable\">\n",
"<tr>\n",
" <td></td> <th>coef</th> <th>std err</th> <th>t</th> <th>P>|t|</th> <th>[0.025</th> <th>0.975]</th> \n",
"</tr>\n",
"<tr>\n",
" <th>Intercept</th> <td> 300.5296</td> <td> 2.417</td> <td> 124.360</td> <td> 0.000</td> <td> 295.783</td> <td> 305.276</td>\n",
"</tr>\n",
"<tr>\n",
" <th>C(TV)[T.Low]</th> <td> -208.8133</td> <td> 3.329</td> <td> -62.720</td> <td> 0.000</td> <td> -215.353</td> <td> -202.274</td>\n",
"</tr>\n",
"<tr>\n",
" <th>C(TV)[T.Medium]</th> <td> -101.5061</td> <td> 3.325</td> <td> -30.526</td> <td> 0.000</td> <td> -108.038</td> <td> -94.975</td>\n",
"</tr>\n",
"</table>\n",
"<table class=\"simpletable\">\n",
"<tr>\n",
" <th>Omnibus:</th> <td>450.714</td> <th> Durbin-Watson: </th> <td> 2.002</td>\n",
"</tr>\n",
"<tr>\n",
" <th>Prob(Omnibus):</th> <td> 0.000</td> <th> Jarque-Bera (JB): </th> <td> 35.763</td>\n",
"</tr>\n",
"<tr>\n",
" <th>Skew:</th> <td>-0.044</td> <th> Prob(JB): </th> <td>1.71e-08</td>\n",
"</tr>\n",
"<tr>\n",
" <th>Kurtosis:</th> <td> 1.775</td> <th> Cond. No. </th> <td> 3.86</td>\n",
"</tr>\n",
"</table><br/><br/>Warnings:<br/>[1] Standard Errors assume that the covariance matrix of the errors is correctly specified."
],
"text/plain": [
"<class 'statsmodels.iolib.summary.Summary'>\n",
"\"\"\"\n",
" OLS Regression Results \n",
"==============================================================================\n",
"Dep. Variable: Sales R-squared: 0.874\n",
"Model: OLS Adj. R-squared: 0.874\n",
"Method: Least Squares F-statistic: 1971.\n",
"Date: Tue, 05 Sep 2023 Prob (F-statistic): 8.81e-256\n",
"Time: 23:38:58 Log-Likelihood: -2778.9\n",
"No. Observations: 569 AIC: 5564.\n",
"Df Residuals: 566 BIC: 5577.\n",
"Df Model: 2 \n",
"Covariance Type: nonrobust \n",
"===================================================================================\n",
" coef std err t P>|t| [0.025 0.975]\n",
"-----------------------------------------------------------------------------------\n",
"Intercept 300.5296 2.417 124.360 0.000 295.783 305.276\n",
"C(TV)[T.Low] -208.8133 3.329 -62.720 0.000 -215.353 -202.274\n",
"C(TV)[T.Medium] -101.5061 3.325 -30.526 0.000 -108.038 -94.975\n",
"==============================================================================\n",
"Omnibus: 450.714 Durbin-Watson: 2.002\n",
"Prob(Omnibus): 0.000 Jarque-Bera (JB): 35.763\n",
"Skew: -0.044 Prob(JB): 1.71e-08\n",
"Kurtosis: 1.775 Cond. No. 3.86\n",
"==============================================================================\n",
"\n",
"Warnings:\n",
"[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n",
"\"\"\""
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Display the model results summary.\n",
"\n",
"summary\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "bcd73e0e",
"metadata": {
"id": "b29062e1-5c7f-4a78-b782-a07236bbcc28"
},
"source": [
"**Question:** What is your interpretation of the model's R-squared?\n",
"\n",
"Using `TV` as the independent variable results in a linear regression model with $R^{2} = 0.874$. In other words, the model explains $87.4\\%$ of the variation in `Sales`. This makes the model an effective predictor of `Sales`. "
]
},
{
"cell_type": "markdown",
"id": "de075158",
"metadata": {
"id": "8ca575f3-cbba-4a0a-99d0-b8bf82ea49e4",
"tags": []
},
"source": [
"**Question:** What is your intepretation of the coefficient estimates? Are the coefficients statistically significant?\n",
"\n",
"The default `TV` category for the model is `High`, because there are coefficients for the other two `TV` categories, `Medium` and `Low`. According to the model, `Sales` with a `Medium` or `Low` `TV` category are lower on average than `Sales` with a `High` `TV` category. For example, the model predicts that a `Low` `TV` promotion would be 208.813 (in millions of dollars) lower in `Sales` on average than a `High` `TV` promotion.\n",
"\n",
"The p-value for all coefficients is $0.000$, meaning all coefficients are statistically significant at $p=0.05$. The 95% confidence intervals for each coefficient should be reported when presenting results to stakeholders. For instance, there is a $95\\%$ chance the interval $[-215.353,-202.274]$ contains the true parameter of the slope of $\\beta_{TVLow}$, which is the estimated difference in promotion sales when a `Low` `TV` promotion is chosen instead of a `High` `TV` promotion."
]
},
{
"cell_type": "markdown",
"id": "58371783",
"metadata": {
"id": "b7e61507-0dd5-4d32-8045-ba06cc37fcd4"
},
"source": [
"**Question:** Do you think your model could be improved? Why or why not? How?\n",
"\n",
"Given how accurate `TV` was as a predictor, the model could be improved with a more granular view of the `TV` promotions, such as additional categories or the actual `TV` promotion budgets. Further, additional variables, such as the location of the marketing campaign or the time of year, may increase model accuracy. "
]
},
{
"cell_type": "markdown",
"id": "68d1a980",
"metadata": {
"id": "97b169ad-b113-46e3-996a-53f268adbc6d"
},
"source": [
"### Perform a one-way ANOVA test\n",
"\n",
"With the model fit, run a one-way ANOVA test to determine whether there is a statistically significant difference in `Sales` among groups. "
]
},
{
"cell_type": "code",
"execution_count": 31,
"id": "85426c3b",
"metadata": {
"id": "aadfa800-a74c-4819-abb8-cda13ce16d96"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>df</th>\n",
" <th>sum_sq</th>\n",
" <th>mean_sq</th>\n",
" <th>F</th>\n",
" <th>PR(&gt;F)</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>C(TV)</th>\n",
" <td>2.0</td>\n",
" <td>4.052692e+06</td>\n",
" <td>2.026346e+06</td>\n",
" <td>1971.455737</td>\n",
" <td>8.805550e-256</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Residual</th>\n",
" <td>566.0</td>\n",
" <td>5.817589e+05</td>\n",
" <td>1.027843e+03</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" df sum_sq mean_sq F PR(>F)\n",
"C(TV) 2.0 4.052692e+06 2.026346e+06 1971.455737 8.805550e-256\n",
"Residual 566.0 5.817589e+05 1.027843e+03 NaN NaN"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Create an one-way ANOVA table for the fit model.\n",
"\n",
"sm.stats.anova_lm(model, type=2)"
]
},
{
"cell_type": "markdown",
"id": "8747c9d1",
"metadata": {
"id": "3574a603-96c3-4876-80bd-9864a1e466d6"
},
"source": [
"<details>\n",
"<summary><h4><strong>Hint 1</strong></h4></summary>\n",
"\n",
"Review what you've learned about how to perform a one-way ANOVA test.\n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"id": "c7672317",
"metadata": {
"id": "f1010b34-96b2-403a-8630-e83613ff40be"
},
"source": [
"<details>\n",
"<summary><h4><strong>Hint 2</strong></h4></summary>\n",
"\n",
"There is a function in `statsmodels.api` (i.e. `sm`) that peforms an ANOVA test for a fit linear model.\n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"id": "7e77d7f7",
"metadata": {
"id": "b0ccc536-34c6-4bb7-a022-a2e4bec62397"
},
"source": [
"<details>\n",
"<summary><h4><strong>Hint 3</strong></h4></summary>\n",
"\n",
"Use the `anova_lm()` function from `sm.stats`. Specify the type of ANOVA test (for example, one-way or two-way), using the `typ` parameter.\n",
" \n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"id": "ed14d827",
"metadata": {
"id": "735c20e2-bd53-4e36-81bd-38ae78a4d4a8"
},
"source": [
"**Question:** What are the null and alternative hypotheses for the ANOVA test?\n",
"\n",
"The null hypothesis is that there is no difference in `Sales` based on the `TV` promotion budget.\n",
"\n",
"The alternative hypothesis is that there is a difference in `Sales` based on the `TV` promotion budget."
]
},
{
"cell_type": "markdown",
"id": "50c45697",
"metadata": {
"id": "1f5807cb-aff6-4877-a70c-7dbffdb822e3"
},
"source": [
"**Question:** What is your conclusion from the one-way ANOVA test?\n",
"\n",
"The F-test statistic is 1971.46 and the p-value is $8.81 * 10^{-256}$ (i.e., very small). Because the p-value is less than 0.05, you would reject the null hypothesis that there is no difference in `Sales` based on the `TV` promotion budget."
]
},
{
"cell_type": "markdown",
"id": "097dc928",
"metadata": {
"id": "6e1d8561-3957-400b-89d1-4330ee923193"
},
"source": [
"**Question:** What did the ANOVA test tell you?\n",
"\n",
"The results of the one-way ANOVA test indicate that you can reject the null hypothesis in favor of the alternative hypothesis. There is a statistically significant difference in `Sales` among `TV` groups."
]
},
{
"cell_type": "markdown",
"id": "02de18bc",
"metadata": {
"id": "532a2ba1-8e9a-4c8f-b432-dfeea0e62fc4"
},
"source": [
"### Perform an ANOVA post hoc test\n",
"\n",
"If you have significant results from the one-way ANOVA test, you can apply ANOVA post hoc tests such as the Tukey’s HSD post hoc test. \n",
"\n",
"Run the Tukey’s HSD post hoc test to compare if there is a significant difference between each pair of categories for TV."
]
},
{
"cell_type": "code",
"execution_count": 33,
"id": "209b97a3",
"metadata": {
"id": "cce84e77-6269-4295-b961-8350a4b4920c"
},
"outputs": [
{
"data": {
"text/html": [
"<table class=\"simpletable\">\n",
"<caption>Multiple Comparison of Means - Tukey HSD, FWER=0.05</caption>\n",
"<tr>\n",
" <th>group1</th> <th>group2</th> <th>meandiff</th> <th>p-adj</th> <th>lower</th> <th>upper</th> <th>reject</th>\n",
"</tr>\n",
"<tr>\n",
" <td>High</td> <td>Low</td> <td>-208.8133</td> <td>0.001</td> <td>-216.637</td> <td>-200.9896</td> <td>True</td> \n",
"</tr>\n",
"<tr>\n",
" <td>High</td> <td>Medium</td> <td>-101.5061</td> <td>0.001</td> <td>-109.3204</td> <td>-93.6918</td> <td>True</td> \n",
"</tr>\n",
"<tr>\n",
" <td>Low</td> <td>Medium</td> <td>107.3072</td> <td>0.001</td> <td>99.7063</td> <td>114.908</td> <td>True</td> \n",
"</tr>\n",
"</table>"
],
"text/plain": [
"<class 'statsmodels.iolib.table.SimpleTable'>"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Perform the Tukey's HSD post hoc test.\n",
"\n",
"tukey = pairwise_tukeyhsd(endog=data['Sales'], groups = data['TV'])\n",
"tukey.summary()"
]
},
{
"cell_type": "markdown",
"id": "6695a34a",
"metadata": {
"id": "119b0c6f-b7c7-47a7-80cb-ed94a07fc61c"
},
"source": [
"<details>\n",
"<summary><h4><strong>Hint 1</strong></h4></summary>\n",
"\n",
"Review what you've learned about how to perform a Tukey's HSD post hoc test.\n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"id": "6fa196e5",
"metadata": {
"id": "3adb039f-15d0-4f36-848b-3b469cd4d65d"
},
"source": [
"<details>\n",
"<summary><h4><strong>Hint 2</strong></h4></summary>\n",
"\n",
"Use the `pairwise_tukeyhsd()` function from `statsmodels.stats.multicomp`.\n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"id": "a215acf3",
"metadata": {
"id": "5169a823-fd23-41bc-9766-1b3fd4bff1dc"
},
"source": [
"<details>\n",
"<summary><h4><strong>Hint 3</strong></h4></summary>\n",
"\n",
"The `endog` argument in `pairwise_tukeyhsd` indicates which variable is being compared across groups (i.e., `Sales`). The `groups` argument in `pairwise_tukeyhsd` tells the function which variable holds the group you’re interested in reviewing.\n",
"\n",
"</details>"
]
},
{
"cell_type": "markdown",
"id": "536bbd51",
"metadata": {
"id": "529c0b8a-8ffb-445f-b045-521646408c16"
},
"source": [
"**Question:** What is your interpretation of the Tukey HSD test?\n",
"\n",
"The first row, which compares the `High` and `Low` `TV` groups, indicates that you can reject the null hypothesis that there is no significant difference between the `Sales` of these two groups.\n",
"\n",
"You can also reject the null hypotheses for the two other pairwise comparisons that compare `High` to `Medium` and `Low` to `Medium`."
]
},
{
"cell_type": "markdown",
"id": "15773022",
"metadata": {
"id": "f1bd994c-52ca-49ac-ba00-51bc36d07842"
},
"source": [
"**Question:** What did the post hoc tell you?\n",
"\n",
"A post hoc test was conducted to determine which `TV` groups are different and how many are different from each other. This provides more detail than the one-way ANOVA results, which can at most determine that at least one group is different. Further, using the Tukey HSD controls for the increasing probability of incorrectly rejecting a null hypothesis from peforming multiple tests. \n",
"\n",
"The results were that `Sales` is not the same between any pair of `TV` groups. "
]
},
{
"cell_type": "markdown",
"id": "6e635d7e",
"metadata": {
"id": "agx1bDPU9cd4"
},
"source": [
"## **Considerations**\n",
"\n",
"**What are some key takeaways that you learned during this lab?**\n",
"\n",
"* Box-plots are a helpful tool for visualizing the distribution of a variable across groups.\n",
"* One-way ANOVA can be used to determine if there are significant differences among the means of three or more groups.\n",
"* ANOVA post hoc tests provide a more detailed view of the pairwise differences between groups.\n",
"\n",
"**What summary would you provide to stakeholders? Consider the statistical significance of key relationships and differences in distribution.**\n",
"\n",
"High TV promotion budgets result in significantly more sales than both medium and low TV promotion budgets. Medium TV promotion budgets result in significantly more sales than low TV promotion budgets.\n",
"\n",
"\n",
"Specifically, following are estimates for the difference between the mean sales resulting from different pairs of TV promotions, as determined by the Tukey's HSD test:\n",
"\n",
"* Estimated difference between the mean sales resulting from High and Low TV promotions: \\\\$208.81 million (with 95% confidence that the exact value for this difference is between 200.99 and 216.64 million dollars). \n",
"* Estimated difference between the mean sales resulting from High and Medium TV promotions: \\\\$101.51 million (with 95% confidence that the exact value for this difference is between 93.69 and 109.32 million dollars).\n",
"* difference between the mean sales resulting from Medium and Low TV promotions: \\\\$107.31 million (with 95\\% confidence that the exact value for this difference is between 99.71 and 114.91 million dollars).\n",
"\n",
"The linear regression model estimating `Sales` from `TV` had an R-squared of $0.871, making it a fairly accurate estimator. The model showed a statistically significant relationship between the `TV` promotion budget and `Sales`. \n",
"\n",
"The results of the one-way ANOVA test indicate that the null hypothesis that there is no difference in Sales based on the TV promotion budget can be rejected. Through the ANOVA post hoc test, a significant difference between all pairs of TV promotions was found.\n",
"\n",
"The difference in the distribution of sales across TV promotions was determined significant by both a one-way ANOVA test and a Tukey’s HSD test. \n"
]
},
{
"cell_type": "markdown",
"id": "20d4c9d1",
"metadata": {
"id": "88b01fcc-e016-4cd5-aedc-a71e51276fe2"
},
"source": [
"#### **Reference**\n",
"[Saragih, H.S. *Dummy Marketing and Sales Data*](https://www.kaggle.com/datasets/harrimansaragih/dummy-advertising-and-sales-data)"
]
},
{
"cell_type": "markdown",
"id": "7f6aef8a",
"metadata": {},
"source": [
"**Congratulations!** You've completed this lab. However, you may not notice a green check mark next to this item on Coursera's platform. Please continue your progress regardless of the check mark. Just click on the \"save\" icon at the top of this notebook to ensure your work has been logged."
]
}
],
"metadata": {
"colab": {
"collapsed_sections": [],
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment