Skip to content

Instantly share code, notes, and snippets.

@bendominguez0111
Created May 16, 2020 15:11
Show Gist options
  • Save bendominguez0111/03636364560d1b15df73a06ae2929f88 to your computer and use it in GitHub Desktop.
Save bendominguez0111/03636364560d1b15df73a06ae2929f88 to your computer and use it in GitHub Desktop.
Aaron Jones.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
},
"colab": {
"name": "Aaron Jones.ipynb",
"provenance": [],
"include_colab_link": true
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/fantasydatapros/03636364560d1b15df73a06ae2929f88/aaron-jones.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "7_V8X1BICfFK",
"colab_type": "text"
},
"source": [
"# Statistical Deep-Dive into Aaron Jones' 2019 Fantasy Football season"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "0Y1KSGiiCfFO",
"colab_type": "text"
},
"source": [
"In this notebook, we are going to do a quick view at Aaron Jones 2019 Fantasy Football season.\n",
"\n",
"Aaron Jones finished as RB2 on the season, and if you had in your lineup (as I did) you'd know it was quite the rollercoaster of a season. Let's do some stats and try to analyze his performance.\n",
"\n",
"In this notebook, we'll be examining three questions:\n",
"\n",
"1. How did Aaron Jones compare to other top tier RBs?\n",
"\n",
"We know Jones finished #2 on the season in terms of total points, but total points scored does not tell the whole picture. We also want to look at how consistent Jones was throughout the season and compare that to his top-tier counterparts. \n",
"\n",
"2. Did Jamaal Williams' involvement in the running game actually effect Aaron Jones FF performance?\n",
"\n",
"Probably the most frustrating part of having Jones on my lineup this season - did Williams really have an effect on how Jones scored Fantasy Points week to week? I think the consensus amongst fantasy managers is yes - but when looking the statistics the answer is a bit more nuanced. We'll be looking at Williams' usage numbers and finding the correlation to Jones' output and using a p-value to examine the relationship's statistical signifcance.\n",
"\n",
"3. Based off his usage, did Aaron Jones overperform his season and is he due for a regression in touchdowns?\n",
"\n",
"Another hot topic this year was how effective Aaron Jones was with his touches - of course, being too effective with your touches might mean you got a bit lucky at times and are due for a regression in the next season.\n",
"\n",
"So how do we tell if Jones overperformed his season? We'll be looking at play by play data for 2009 - 2018 and building probability distributions for the likelihood of scoring a touchdown when a team is X yards away from the endzone. We'll be doing this for both receiving and rushing touchdowns and then generating an expected TD value for Aaron Jones based on 2019 play-by-play data. If Jones actual TDs are greater than his expected TD we calculated, he may be due for a regression."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "GQNIuaIPCfFT",
"colab_type": "text"
},
"source": [
"![Picture of Aaron Jones](https://www.wearegreenbay.com/wp-content/uploads/sites/70/2019/11/aaron-jones-mvp.jpg?w=2560&h=1440&crop=1 \"Title\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "C2rNbZ2hCfFY",
"colab_type": "text"
},
"source": [
"To start off, let's import some libraries."
]
},
{
"cell_type": "code",
"metadata": {
"id": "NZkkxNywCfFd",
"colab_type": "code",
"colab": {}
},
"source": [
"import time, os\n",
"import pandas as pd\n",
"from scipy.ndimage.filters import gaussian_filter1d\n",
"import numpy as np\n",
"from matplotlib import pyplot as plt"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "tap_aKX8CfF0",
"colab_type": "text"
},
"source": [
"[You can find weekly and yearly data here.](https://www.fantasyfootballdatapros.com/csv_files)"
]
},
{
"cell_type": "code",
"metadata": {
"id": "zL89wIP7CfF4",
"colab_type": "code",
"colab": {}
},
"source": [
"WEEKLY_BASE_URL = 'data/weekly/2019/week{}.csv'\n",
"YEARLY_BASE_URL = 'data/yearly/2019.csv'"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "MnDKhJb9CfGH",
"colab_type": "text"
},
"source": [
"Let's create an empty DataFrame to start out, and iterate over a range containing each number in a NFL season. Let's add a column to keep track of the weeks, and then concatenate these DataFrames all together to get one big DataFrame containing 2019 data."
]
},
{
"cell_type": "code",
"metadata": {
"id": "HVgbhwYKCfGJ",
"colab_type": "code",
"colab": {}
},
"source": [
"def generate_df():\n",
" df = pd.DataFrame()\n",
" for week in range(1, 18):\n",
" weekly_df = pd.read_csv(WEEKLY_BASE_URL.format(week))\n",
" weekly_df['Week'] = week\n",
" df = pd.concat([df, weekly_df])\n",
" return df\n",
"\n",
"df = generate_df()"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "gxAfcZUjCfGW",
"colab_type": "text"
},
"source": [
"Let's get our top 5 PPR running backs for the 2019 season and confirm Aaron Jones is #2."
]
},
{
"cell_type": "code",
"metadata": {
"id": "QP2TJvScCfGb",
"colab_type": "code",
"colab": {},
"outputId": "15011ea8-3869-41d0-aa76-9a88ef02267e"
},
"source": [
"df.loc[df['Pos'] == 'RB'].groupby('Player')['FantasyPoints'].sum().sort_values(ascending=False).head()"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"Player\n",
"Christian McCaffrey 442.0\n",
"Aaron Jones 298.3\n",
"Austin Ekeler 288.4\n",
"Ezekiel Elliott 287.4\n",
"Dalvin Cook 276.6\n",
"Name: FantasyPoints, dtype: float64"
]
},
"metadata": {
"tags": []
},
"execution_count": 4
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "vSE7IMKbCfGt",
"colab_type": "text"
},
"source": [
"Here, we `groupby` position, grab the FantasyPoints column and then use a df method known as `describe` to get us back some descriptive statistics about running backs and wide receivers for the 2019 season. Most meaningful here is the mean and std numbers. Standard Deviation for RBs and WRs is about the same for both, but mean WR output was higher incidentally for the 2019 season. This is most likely because these numbers are in PPR."
]
},
{
"cell_type": "code",
"metadata": {
"id": "mRRGVChQCfGw",
"colab_type": "code",
"colab": {},
"outputId": "3aa9506f-6a2d-4af1-a8c1-b034b7eeb8cf"
},
"source": [
"df2019 = df.groupby('Pos')['FantasyPoints'].describe().reset_index()\n",
"#grab WR too for fun\n",
"df2019[(df2019['Pos'] == 'RB') | (df2019['Pos'] == 'WR')]"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Pos</th>\n",
" <th>count</th>\n",
" <th>mean</th>\n",
" <th>std</th>\n",
" <th>min</th>\n",
" <th>25%</th>\n",
" <th>50%</th>\n",
" <th>75%</th>\n",
" <th>max</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>26</th>\n",
" <td>RB</td>\n",
" <td>1145.0</td>\n",
" <td>8.254271</td>\n",
" <td>8.544875</td>\n",
" <td>-2.0</td>\n",
" <td>1.5</td>\n",
" <td>5.6</td>\n",
" <td>12.7</td>\n",
" <td>49.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>30</th>\n",
" <td>WR</td>\n",
" <td>1241.0</td>\n",
" <td>11.889637</td>\n",
" <td>8.510828</td>\n",
" <td>-2.0</td>\n",
" <td>6.3</td>\n",
" <td>10.8</td>\n",
" <td>16.5</td>\n",
" <td>53.7</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Pos count mean std min 25% 50% 75% max\n",
"26 RB 1145.0 8.254271 8.544875 -2.0 1.5 5.6 12.7 49.2\n",
"30 WR 1241.0 11.889637 8.510828 -2.0 6.3 10.8 16.5 53.7"
]
},
"metadata": {
"tags": []
},
"execution_count": 5
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "hLEOIrybCfG-",
"colab_type": "text"
},
"source": [
"## Question 1: How did Aaron Jones do compared to other top-tier RBs?\n",
"\n",
"Let's actually take our DataFrame and group by player and get Aaron Jones, and describe his data."
]
},
{
"cell_type": "code",
"metadata": {
"id": "lH1Kxj5SCfHA",
"colab_type": "code",
"colab": {},
"outputId": "cff47bbe-176e-4db6-e51a-1a991a692ec2"
},
"source": [
"aj = df.groupby('Player').get_group('Aaron Jones')\n",
"\n",
"#columns we'd like to keep.\n",
"columns = ['Week', 'Tgt', 'Rec', 'ReceivingYds', 'ReceivingTD', 'RushingAtt', 'RushingYds', 'RushingTD', 'FumblesLost', 'FantasyPoints']\n",
"\n",
"#filtering out unneccessary columns\n",
"aj = aj[columns]\n",
"aj.describe()"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Week</th>\n",
" <th>Tgt</th>\n",
" <th>Rec</th>\n",
" <th>ReceivingYds</th>\n",
" <th>ReceivingTD</th>\n",
" <th>RushingAtt</th>\n",
" <th>RushingYds</th>\n",
" <th>RushingTD</th>\n",
" <th>FumblesLost</th>\n",
" <th>FantasyPoints</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>16.000000</td>\n",
" <td>16.0000</td>\n",
" <td>16.000000</td>\n",
" <td>16.000000</td>\n",
" <td>16.000000</td>\n",
" <td>16.000000</td>\n",
" <td>16.000000</td>\n",
" <td>16.000000</td>\n",
" <td>16.000000</td>\n",
" <td>16.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>8.875000</td>\n",
" <td>2.8750</td>\n",
" <td>2.250000</td>\n",
" <td>27.437500</td>\n",
" <td>0.187500</td>\n",
" <td>14.750000</td>\n",
" <td>67.750000</td>\n",
" <td>1.000000</td>\n",
" <td>0.125000</td>\n",
" <td>18.643750</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>5.188127</td>\n",
" <td>3.4809</td>\n",
" <td>2.886751</td>\n",
" <td>42.963502</td>\n",
" <td>0.543906</td>\n",
" <td>5.053052</td>\n",
" <td>43.646306</td>\n",
" <td>1.264911</td>\n",
" <td>0.341565</td>\n",
" <td>14.214733</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>1.000000</td>\n",
" <td>0.0000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>8.000000</td>\n",
" <td>18.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1.800000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>4.750000</td>\n",
" <td>0.0000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>11.750000</td>\n",
" <td>36.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>3.875000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>8.500000</td>\n",
" <td>0.0000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>13.000000</td>\n",
" <td>50.500000</td>\n",
" <td>0.500000</td>\n",
" <td>0.000000</td>\n",
" <td>17.450000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>13.250000</td>\n",
" <td>6.2500</td>\n",
" <td>4.500000</td>\n",
" <td>38.500000</td>\n",
" <td>0.000000</td>\n",
" <td>16.750000</td>\n",
" <td>101.750000</td>\n",
" <td>2.000000</td>\n",
" <td>0.000000</td>\n",
" <td>25.875000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>17.000000</td>\n",
" <td>8.0000</td>\n",
" <td>7.000000</td>\n",
" <td>159.000000</td>\n",
" <td>2.000000</td>\n",
" <td>25.000000</td>\n",
" <td>154.000000</td>\n",
" <td>4.000000</td>\n",
" <td>1.000000</td>\n",
" <td>49.200000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Week Tgt Rec ReceivingYds ReceivingTD RushingAtt \\\n",
"count 16.000000 16.0000 16.000000 16.000000 16.000000 16.000000 \n",
"mean 8.875000 2.8750 2.250000 27.437500 0.187500 14.750000 \n",
"std 5.188127 3.4809 2.886751 42.963502 0.543906 5.053052 \n",
"min 1.000000 0.0000 0.000000 0.000000 0.000000 8.000000 \n",
"25% 4.750000 0.0000 0.000000 0.000000 0.000000 11.750000 \n",
"50% 8.500000 0.0000 0.000000 0.000000 0.000000 13.000000 \n",
"75% 13.250000 6.2500 4.500000 38.500000 0.000000 16.750000 \n",
"max 17.000000 8.0000 7.000000 159.000000 2.000000 25.000000 \n",
"\n",
" RushingYds RushingTD FumblesLost FantasyPoints \n",
"count 16.000000 16.000000 16.000000 16.000000 \n",
"mean 67.750000 1.000000 0.125000 18.643750 \n",
"std 43.646306 1.264911 0.341565 14.214733 \n",
"min 18.000000 0.000000 0.000000 1.800000 \n",
"25% 36.000000 0.000000 0.000000 3.875000 \n",
"50% 50.500000 0.500000 0.000000 17.450000 \n",
"75% 101.750000 2.000000 0.000000 25.875000 \n",
"max 154.000000 4.000000 1.000000 49.200000 "
]
},
"metadata": {
"tags": []
},
"execution_count": 6
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "GJ_ADqrpCfHJ",
"colab_type": "text"
},
"source": [
"These are just regular statistics, but the one we should focus on here is standard deviation. Specifically, std of FantasyPoints. It's about 14.214, which is quite high. One thing I want to emphasize throughout this analysis is that Fantasy Football output is not good enough when making FF decisions. We also want to consider how consistent the player was in getting us Fantasy Football points each week. Let's look at Christian McCaffrey and Derrick Henry, two other top RB's on the season and see how their standard deviation compared to Jones."
]
},
{
"cell_type": "code",
"metadata": {
"id": "w68UU0DNCfHK",
"colab_type": "code",
"colab": {},
"outputId": "7cc33665-7435-4162-94bb-0020ae066f48"
},
"source": [
"mcc = df.groupby('Player').get_group('Christian McCaffrey')\n",
"mcc = mcc[columns]\n",
"mcc.describe()"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Week</th>\n",
" <th>Tgt</th>\n",
" <th>Rec</th>\n",
" <th>ReceivingYds</th>\n",
" <th>ReceivingTD</th>\n",
" <th>RushingAtt</th>\n",
" <th>RushingYds</th>\n",
" <th>RushingTD</th>\n",
" <th>FumblesLost</th>\n",
" <th>FantasyPoints</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>16.000000</td>\n",
" <td>16.000000</td>\n",
" <td>16.000000</td>\n",
" <td>16.00000</td>\n",
" <td>16.000000</td>\n",
" <td>16.000000</td>\n",
" <td>16.000000</td>\n",
" <td>16.000000</td>\n",
" <td>16.0</td>\n",
" <td>16.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>9.125000</td>\n",
" <td>8.000000</td>\n",
" <td>6.687500</td>\n",
" <td>58.93750</td>\n",
" <td>0.125000</td>\n",
" <td>17.937500</td>\n",
" <td>86.687500</td>\n",
" <td>0.937500</td>\n",
" <td>0.0</td>\n",
" <td>27.625000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>5.188127</td>\n",
" <td>4.966555</td>\n",
" <td>4.407853</td>\n",
" <td>38.67552</td>\n",
" <td>0.341565</td>\n",
" <td>5.157115</td>\n",
" <td>46.963417</td>\n",
" <td>0.771902</td>\n",
" <td>0.0</td>\n",
" <td>11.287131</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>1.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.00000</td>\n",
" <td>0.000000</td>\n",
" <td>9.000000</td>\n",
" <td>26.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.0</td>\n",
" <td>3.700000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>4.750000</td>\n",
" <td>4.750000</td>\n",
" <td>3.750000</td>\n",
" <td>34.50000</td>\n",
" <td>0.000000</td>\n",
" <td>14.000000</td>\n",
" <td>50.750000</td>\n",
" <td>0.000000</td>\n",
" <td>0.0</td>\n",
" <td>24.075000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>9.500000</td>\n",
" <td>9.500000</td>\n",
" <td>7.000000</td>\n",
" <td>65.00000</td>\n",
" <td>0.000000</td>\n",
" <td>19.000000</td>\n",
" <td>78.500000</td>\n",
" <td>1.000000</td>\n",
" <td>0.0</td>\n",
" <td>27.200000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>13.250000</td>\n",
" <td>11.250000</td>\n",
" <td>10.000000</td>\n",
" <td>83.00000</td>\n",
" <td>0.000000</td>\n",
" <td>22.000000</td>\n",
" <td>119.750000</td>\n",
" <td>1.250000</td>\n",
" <td>0.0</td>\n",
" <td>34.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>17.000000</td>\n",
" <td>15.000000</td>\n",
" <td>15.000000</td>\n",
" <td>121.00000</td>\n",
" <td>1.000000</td>\n",
" <td>27.000000</td>\n",
" <td>176.000000</td>\n",
" <td>2.000000</td>\n",
" <td>0.0</td>\n",
" <td>47.700000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Week Tgt Rec ReceivingYds ReceivingTD RushingAtt \\\n",
"count 16.000000 16.000000 16.000000 16.00000 16.000000 16.000000 \n",
"mean 9.125000 8.000000 6.687500 58.93750 0.125000 17.937500 \n",
"std 5.188127 4.966555 4.407853 38.67552 0.341565 5.157115 \n",
"min 1.000000 0.000000 0.000000 0.00000 0.000000 9.000000 \n",
"25% 4.750000 4.750000 3.750000 34.50000 0.000000 14.000000 \n",
"50% 9.500000 9.500000 7.000000 65.00000 0.000000 19.000000 \n",
"75% 13.250000 11.250000 10.000000 83.00000 0.000000 22.000000 \n",
"max 17.000000 15.000000 15.000000 121.00000 1.000000 27.000000 \n",
"\n",
" RushingYds RushingTD FumblesLost FantasyPoints \n",
"count 16.000000 16.000000 16.0 16.000000 \n",
"mean 86.687500 0.937500 0.0 27.625000 \n",
"std 46.963417 0.771902 0.0 11.287131 \n",
"min 26.000000 0.000000 0.0 3.700000 \n",
"25% 50.750000 0.000000 0.0 24.075000 \n",
"50% 78.500000 1.000000 0.0 27.200000 \n",
"75% 119.750000 1.250000 0.0 34.000000 \n",
"max 176.000000 2.000000 0.0 47.700000 "
]
},
"metadata": {
"tags": []
},
"execution_count": 42
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ustRizTfCfHW",
"colab_type": "text"
},
"source": [
"MCC had a lower standard deviation and beat Aaron Jones by about 9 in mean Fantasy Points per game. MCC was a beast this season, we all know this (trust me, I had to play the MCC owner like 3 times this season). Let's look at Henry which makes the decision between Henry and Jones less clear cut."
]
},
{
"cell_type": "code",
"metadata": {
"id": "vWq11bTaCfHY",
"colab_type": "code",
"colab": {},
"outputId": "2ccd983d-f466-4144-885a-fd0856c70b14"
},
"source": [
"henry = df.groupby('Player').get_group('Derrick Henry')\n",
"henry = henry[columns]\n",
"henry.describe()"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Week</th>\n",
" <th>Tgt</th>\n",
" <th>Rec</th>\n",
" <th>ReceivingYds</th>\n",
" <th>ReceivingTD</th>\n",
" <th>RushingAtt</th>\n",
" <th>RushingYds</th>\n",
" <th>RushingTD</th>\n",
" <th>FumblesLost</th>\n",
" <th>FantasyPoints</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>15.000000</td>\n",
" <td>15.000000</td>\n",
" <td>15.000000</td>\n",
" <td>15.000000</td>\n",
" <td>15.000000</td>\n",
" <td>15.000000</td>\n",
" <td>15.000000</td>\n",
" <td>15.000000</td>\n",
" <td>15.000000</td>\n",
" <td>15.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>8.400000</td>\n",
" <td>0.333333</td>\n",
" <td>0.266667</td>\n",
" <td>7.400000</td>\n",
" <td>0.133333</td>\n",
" <td>20.200000</td>\n",
" <td>102.666667</td>\n",
" <td>1.066667</td>\n",
" <td>0.200000</td>\n",
" <td>18.073333</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>4.997142</td>\n",
" <td>0.899735</td>\n",
" <td>0.798809</td>\n",
" <td>20.873086</td>\n",
" <td>0.351866</td>\n",
" <td>5.157519</td>\n",
" <td>51.771016</td>\n",
" <td>0.883715</td>\n",
" <td>0.414039</td>\n",
" <td>10.335754</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>1.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>13.000000</td>\n",
" <td>28.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>2.800000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>4.500000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>16.500000</td>\n",
" <td>76.500000</td>\n",
" <td>0.500000</td>\n",
" <td>0.000000</td>\n",
" <td>10.200000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>8.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>19.000000</td>\n",
" <td>86.000000</td>\n",
" <td>1.000000</td>\n",
" <td>0.000000</td>\n",
" <td>15.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>12.500000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>22.500000</td>\n",
" <td>126.000000</td>\n",
" <td>1.500000</td>\n",
" <td>0.000000</td>\n",
" <td>25.400000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>17.000000</td>\n",
" <td>3.000000</td>\n",
" <td>3.000000</td>\n",
" <td>75.000000</td>\n",
" <td>1.000000</td>\n",
" <td>32.000000</td>\n",
" <td>211.000000</td>\n",
" <td>3.000000</td>\n",
" <td>1.000000</td>\n",
" <td>39.100000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Week Tgt Rec ReceivingYds ReceivingTD RushingAtt \\\n",
"count 15.000000 15.000000 15.000000 15.000000 15.000000 15.000000 \n",
"mean 8.400000 0.333333 0.266667 7.400000 0.133333 20.200000 \n",
"std 4.997142 0.899735 0.798809 20.873086 0.351866 5.157519 \n",
"min 1.000000 0.000000 0.000000 0.000000 0.000000 13.000000 \n",
"25% 4.500000 0.000000 0.000000 0.000000 0.000000 16.500000 \n",
"50% 8.000000 0.000000 0.000000 0.000000 0.000000 19.000000 \n",
"75% 12.500000 0.000000 0.000000 0.000000 0.000000 22.500000 \n",
"max 17.000000 3.000000 3.000000 75.000000 1.000000 32.000000 \n",
"\n",
" RushingYds RushingTD FumblesLost FantasyPoints \n",
"count 15.000000 15.000000 15.000000 15.000000 \n",
"mean 102.666667 1.066667 0.200000 18.073333 \n",
"std 51.771016 0.883715 0.414039 10.335754 \n",
"min 28.000000 0.000000 0.000000 2.800000 \n",
"25% 76.500000 0.500000 0.000000 10.200000 \n",
"50% 86.000000 1.000000 0.000000 15.000000 \n",
"75% 126.000000 1.500000 0.000000 25.400000 \n",
"max 211.000000 3.000000 1.000000 39.100000 "
]
},
"metadata": {
"tags": []
},
"execution_count": 8
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "vEUEZKbiCfHm",
"colab_type": "text"
},
"source": [
"So as expected, Henry has about less Fantasy Points per game on the season (albeit not by much), but his standard deviation is lower by about 4. That's big. That means that Henry was only a bit less effective than Jones in the games he did play this season, but he was more consistent. When you look at the ratio of the standard deviation to the mean (also known as the coefficient of variation), Jones has a much higher number. This made season a bit more volatile. In general, I would say volatility is bad for Fantasy Football. You want players who you can predict and consistently can get you points week in and week out."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "PpUJj8yLCfHo",
"colab_type": "text"
},
"source": [
"These are fine examples, but let's see how Aaron Jones faired amongst all players in terms of his variation."
]
},
{
"cell_type": "code",
"metadata": {
"id": "LxkGR9HACfHq",
"colab_type": "code",
"colab": {},
"outputId": "75d4180c-a504-4b31-8095-13958851a576"
},
"source": [
"df.groupby('Player')[['FantasyPoints']].std().sort_values(by='FantasyPoints', ascending=False).head()"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>FantasyPoints</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Player</th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Will Fuller</th>\n",
" <td>16.574162</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Aaron Jones</th>\n",
" <td>14.214733</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Mike Evans</th>\n",
" <td>13.876619</td>\n",
" </tr>\n",
" <tr>\n",
" <th>John Ross</th>\n",
" <td>13.086431</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Boston Scott</th>\n",
" <td>12.784303</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" FantasyPoints\n",
"Player \n",
"Will Fuller 16.574162\n",
"Aaron Jones 14.214733\n",
"Mike Evans 13.876619\n",
"John Ross 13.086431\n",
"Boston Scott 12.784303"
]
},
"metadata": {
"tags": []
},
"execution_count": 51
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "7IlLK3r_CfH1",
"colab_type": "text"
},
"source": [
"Aaron Jones finished second on the season in terms of standard deviation (no surprise Will Fuller is number one on this list)."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5JjdBmAlCfH2",
"colab_type": "text"
},
"source": [
"## Question 2: How did Jamaal Williams' Usage Affect Aaron Jones' Output on the Season?\n",
"\n",
"Now let's see the correlation between Jamaal Williams' usage throughout the season and how that impacted Aaron Jones' Fantasy Football performance. First, we need to grab Williams' stats from the data, and then, we want to join the two tables together as Jamaal is missing some weeks. We only want to include those weeks that both Williams and Jones both played, so we join the tables on the `Week` column and do a left join of the two DataFrames."
]
},
{
"cell_type": "code",
"metadata": {
"id": "gr8AJUBKCfH4",
"colab_type": "code",
"colab": {}
},
"source": [
"#grab jamaal from our data\n",
"jamaal = df.groupby('Player').get_group('Jamaal Williams')\n",
"jamaal = jamaal[columns]\n",
"#join the tables on week\n",
"df = jamaal.set_index('Week').join(aj.set_index('Week'), how='left', lsuffix='_JamaalWilliams', rsuffix='_AaronJones')\n",
"df.reset_index(inplace=True)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "KkDKceSiCfIE",
"colab_type": "text"
},
"source": [
"Let's define usage as rushing attempts + targets."
]
},
{
"cell_type": "code",
"metadata": {
"id": "jncyUrJ9CfIG",
"colab_type": "code",
"colab": {}
},
"source": [
"df['Usage_JamaalWilliams'] = df['RushingAtt_JamaalWilliams'] + df['Tgt_JamaalWilliams']"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "ruW-9KeZCfIP",
"colab_type": "text"
},
"source": [
"From the `scipy` library, let's import a function called `pearsonr` which gives us back both the correaltion coefficient for two variables and also a p-value. A p-value tells us the probability of commiting a Type I Error. Essentially, in stats we have a null-hypothesis - the world in it's current state that we need to disprove using an alternate hypothesis. Our null hypothesis here is that **there is no linear relationship between Jamaal Williams and Aaron Jones' production**. We need to disprove this null hypothesis by setting a significance level, which is the amount of probability of a Type 1 Error we are willing to accept. If our p-value is less than the significance value, we can reject our null hypothesis and say that there is some relationship betwen Jamaal Williams' usage and Aaron Jones' production."
]
},
{
"cell_type": "code",
"metadata": {
"id": "Y_hKGj55CfIT",
"colab_type": "code",
"colab": {},
"outputId": "3b022037-dca0-4ca8-8116-74ca39da83c0"
},
"source": [
"from scipy.stats import pearsonr\n",
"\n",
"alpha = 0.05\n",
"\n",
"p_r = pearsonr(df['FantasyPoints_AaronJones'], df['Usage_JamaalWilliams'])\n",
"print('''\n",
"Out: The relationship between Aaron Jones Fantasy Football output and Jamaal Williams \n",
"has a correlation of {} and a p-value of {} \\n'''.format(p_r[0], p_r[1]))\n",
"\n",
"if alpha > p_r[1]:\n",
" print('Reject the null hypothesis.')\n",
"else:\n",
" print('Do not reject the null hypothesis.')"
],
"execution_count": 0,
"outputs": [
{
"output_type": "stream",
"text": [
"\n",
"Out: The relationship between Aaron Jones Fantasy Football output and Jamaal Williams \n",
"has a correlation of -0.5626766793635043 and a p-value of 0.0568338493039637 \n",
"\n",
"Do not reject the null hypothesis.\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "QUNeSmONCfIi",
"colab_type": "text"
},
"source": [
"We cannot reject the null-hypothesis here that there is no relationship between Williams' usage and Jones' production."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ol5pFSTKCfIk",
"colab_type": "text"
},
"source": [
"## Question 3: Is Jones due for a regression in TD's?\n",
"\n",
"As mentioned in the intro, we are going to be analzying whether or not Aaron Jones overperformed his 2019 season based off the *quality* of his usage. The way we do this is by generating a probability distribution which tells us the probability of scoring a receiving or rushing touchdown when a team is X yards away from the endzone. We are then going to be using this model and turning around and looking at Aaron Jones' actual 2019 play-by-play data and come up with an expected TD number. If Jones' expected numbers exceeded his actual numbers, he underperformed his season and is probably due for a positive regression in TDs in 2020. If his actual number exceed his expected numbers, however, he is probably due for a regression. "
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "zgSI6ZT4CfIn",
"colab_type": "text"
},
"source": [
"First, let's generate a fresh DataFrame using the function we defined above and let's confirm that Aaron Jones is indeed at the top of the list in terms of TDs."
]
},
{
"cell_type": "code",
"metadata": {
"id": "9Klv-JHlCfIq",
"colab_type": "code",
"colab": {},
"outputId": "464c6692-0ea5-4998-d846-bf79d260fff6"
},
"source": [
"df = generate_df()\n",
"\n",
"df.loc[df['Pos'] == 'RB'].groupby('Player')['RushingTD'].sum().sort_values(ascending=False).head()"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"Player\n",
"Aaron Jones 16\n",
"Derrick Henry 16\n",
"Christian McCaffrey 15\n",
"Dalvin Cook 13\n",
"Todd Gurley 12\n",
"Name: RushingTD, dtype: int64"
]
},
"metadata": {
"tags": []
},
"execution_count": 12
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "N9S-GLpBCfI1",
"colab_type": "text"
},
"source": [
"Next, let's load in play-by-play data for 2009 to 2018 that we got from Kaggle. [Here's the link to that. Beware: it's a large file at 700MB.](https://www.kaggle.com/maxhorowitz/nflplaybyplay2009to2016/data#NFL%20Play%20by%20Play%202009-2018%20(v5).csv)"
]
},
{
"cell_type": "code",
"metadata": {
"id": "6FymFQP3CfI8",
"colab_type": "code",
"colab": {},
"outputId": "9bce1dc3-3776-4e8b-b163-d6a93cf5d3b2"
},
"source": [
"#timing it just for fun\n",
"start = time.time() \n",
"data = pd.read_csv('data/playbyplay2009_2018.csv')\n",
"end = time.time()\n",
"print(f'{end - start} seconds to load playbyplay2009_2018 data')\n",
"#checking the size just cause\n",
"MB = os.stat('data/playbyplay2009_2018.csv').st_size / 10**6\n",
"print('Filesize:', MB, 'MB')\n",
"print('Number of columns:', data.shape[1])\n",
"print('Number of rows:', data.shape[0])"
],
"execution_count": 0,
"outputs": [
{
"output_type": "stream",
"text": [
"/home/ben/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3063: DtypeWarning: Columns (42,166,167,168,169,174,175,178,179,182,183,188,189,190,191,194,195,203,204,205,218,219,220,231,232,233,238,240,241,249) have mixed types.Specify dtype option on import or set low_memory=False.\n",
" interactivity=interactivity, compiler=compiler, result=result)\n"
],
"name": "stderr"
},
{
"output_type": "stream",
"text": [
"17.27703094482422 seconds to load playbyplay2009_2018 data\n",
"Filesize: 700.397316 MB\n",
"Number of columns: 255\n",
"Number of rows: 449371\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "r_YJMvRiCfJI",
"colab_type": "text"
},
"source": [
"This function here is the function that generates our probability distribution. It takes in an `output_variable`, which is what we are calculating the probability of. Our output variables here is going to be Passing TDs and Rushing TDs. It also takes in a `filter_variable`, which is how we filter our play by play data. For passing TDs, for example, we only want those plays where a pass attempt occured. So we pass in `passing_attempt`. We pass in our `data` as our df we brought in just above, and we use a `smoothing_sigma` to smooth out our distribution a bit."
]
},
{
"cell_type": "code",
"metadata": {
"id": "IP-IdW9FCfJK",
"colab_type": "code",
"colab": {}
},
"source": [
"def generate_prob_based_off_ydline(output_variable, filter_variable, df=data, smoothing_sigma=2):\n",
" \n",
" distance_from_100_yd_line = df['yardline_100']\n",
" output_column = df[output_variable]\n",
" filter_column = df[filter_variable]\n",
" two_point_attempt = df['two_point_attempt']\n",
" \n",
" df_values = {\n",
" 'DistanceFromEndzone': distance_from_100_yd_line,\n",
" output_variable: output_column,\n",
" filter_variable: filter_column,\n",
" 'two_point_attempt': two_point_attempt\n",
" }\n",
" \n",
" df = pd.DataFrame(df_values)\n",
" \n",
" df = df[df[filter_variable] == 1]\n",
" #remove two point plays\n",
" df = df[df['two_point_attempt'] == 0]\n",
" df.drop([filter_variable, 'two_point_attempt'], axis=1, inplace=True)\n",
" \n",
" norm_df = df.groupby('DistanceFromEndzone')[output_variable].value_counts(normalize=True)\n",
" norm_df = pd.DataFrame({'p': norm_df.values.flatten()}, index=norm_df.index)\n",
" norm_df = norm_df[norm_df.index.get_level_values(output_variable) == 1].reset_index()\n",
" \n",
" #smooth out our probabilities\n",
" norm_df['p_smoothed'] = gaussian_filter1d(norm_df['p'], sigma=smoothing_sigma)\n",
" \n",
" norm_df.drop(output_variable, axis=1, inplace=True)\n",
" return norm_df"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "Vh4gcqd_CfJS",
"colab_type": "code",
"colab": {}
},
"source": [
"#define our output and filter columns so we can reference them later\n",
"passing_columns = ['pass_touchdown', 'pass_attempt']\n",
"rushing_columns = ['rush_touchdown', 'rush_attempt']"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "wLU87b7lCfJd",
"colab_type": "text"
},
"source": [
"We've already written a lot of the code, so creating the distribution is as easy as the two lines below."
]
},
{
"cell_type": "code",
"metadata": {
"id": "4L0uiz_cCfJe",
"colab_type": "code",
"colab": {},
"outputId": "7da49183-3b10-4af6-f387-0604fb0d4ff0"
},
"source": [
"passing_df = generate_prob_based_off_ydline(*passing_columns)\n",
"\n",
"passing_df.plot(x='DistanceFromEndzone', y='p_smoothed')"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f777e058e90>"
]
},
"metadata": {
"tags": []
},
"execution_count": 16
},
{
"output_type": "display_data",
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"tags": [],
"needs_background": "light"
}
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "iHZnC9odCfJp",
"colab_type": "text"
},
"source": [
"The visualization is pretty self-explanatory. As we get further and further away from the goal line, the probability of a passing TD decreases. As you can see below, at our opponents 1 yard line, we have a 44% chance of scoring a passing touchdown. Run `passing_df.tail()` and you'll see that at our own one yard line, we have a very small chance of scoring a passing TD."
]
},
{
"cell_type": "code",
"metadata": {
"id": "TALwJB-3CfJr",
"colab_type": "code",
"colab": {},
"outputId": "9499bc65-626b-40bc-f755-2c6d727afacc"
},
"source": [
"passing_df.head()"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>DistanceFromEndzone</th>\n",
" <th>p</th>\n",
" <th>p_smoothed</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1.0</td>\n",
" <td>0.497980</td>\n",
" <td>0.444381</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2.0</td>\n",
" <td>0.441658</td>\n",
" <td>0.429849</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3.0</td>\n",
" <td>0.415272</td>\n",
" <td>0.404989</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4.0</td>\n",
" <td>0.347145</td>\n",
" <td>0.375353</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5.0</td>\n",
" <td>0.360465</td>\n",
" <td>0.344881</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" DistanceFromEndzone p p_smoothed\n",
"0 1.0 0.497980 0.444381\n",
"1 2.0 0.441658 0.429849\n",
"2 3.0 0.415272 0.404989\n",
"3 4.0 0.347145 0.375353\n",
"4 5.0 0.360465 0.344881"
]
},
"metadata": {
"tags": []
},
"execution_count": 17
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "6QodI93kCfJ3",
"colab_type": "text"
},
"source": [
"We repeat the process for rushing touchdowns. We are going to be using these `passing_df` and `rushing_df` vars later in our code. "
]
},
{
"cell_type": "code",
"metadata": {
"id": "y59atu_rCfJ3",
"colab_type": "code",
"colab": {},
"outputId": "896756ae-c7e9-4a9c-fd02-2a5e94e93e5a"
},
"source": [
"rushing_df = generate_prob_based_off_ydline(*rushing_columns)\n",
"\n",
"rushing_df.plot(x='DistanceFromEndzone', y='p_smoothed')"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f777d675110>"
]
},
"metadata": {
"tags": []
},
"execution_count": 18
},
{
"output_type": "display_data",
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"tags": [],
"needs_background": "light"
}
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "AwAlwYq-CfKA",
"colab_type": "code",
"colab": {},
"outputId": "010a1393-1024-4559-8dc1-5689d11eb6a6"
},
"source": [
"rushing_df.head()"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>DistanceFromEndzone</th>\n",
" <th>p</th>\n",
" <th>p_smoothed</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1.0</td>\n",
" <td>0.540170</td>\n",
" <td>0.416936</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2.0</td>\n",
" <td>0.397287</td>\n",
" <td>0.387459</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3.0</td>\n",
" <td>0.333700</td>\n",
" <td>0.338159</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4.0</td>\n",
" <td>0.271493</td>\n",
" <td>0.282243</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5.0</td>\n",
" <td>0.196447</td>\n",
" <td>0.230087</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" DistanceFromEndzone p p_smoothed\n",
"0 1.0 0.540170 0.416936\n",
"1 2.0 0.397287 0.387459\n",
"2 3.0 0.333700 0.338159\n",
"3 4.0 0.271493 0.282243\n",
"4 5.0 0.196447 0.230087"
]
},
"metadata": {
"tags": []
},
"execution_count": 19
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "InT-4LCCCfKK",
"colab_type": "text"
},
"source": [
"Now, we load in our 2019 playbyplay data. [You can find a link for that here](http://nflsavant.com/about.php) The problem with this data is that it comes from a different source (we don't have 2019 play by play data from Kaggle), and the data is kinda messy. There is no player name columns so we're going to have to do a fair bit of hacking to be able to take a play description column and turn it in to a player name column. "
]
},
{
"cell_type": "code",
"metadata": {
"id": "6Ri39SsJCfKL",
"colab_type": "code",
"colab": {}
},
"source": [
"data2019 = pd.read_csv('data/playbyplay2019.csv')"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "VO1W4BiQCfKV",
"colab_type": "code",
"colab": {},
"outputId": "ec51efd3-d25a-46f2-85ad-2cc58cd61ccd"
},
"source": [
"#Let's see what columns we have to work with in this new data set\n",
"', '.join(data2019.columns)"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"'GameId, GameDate, Quarter, Minute, Second, OffenseTeam, DefenseTeam, Down, ToGo, YardLine, Unnamed: 10, SeriesFirstDown, Unnamed: 12, NextScore, Description, TeamWin, Unnamed: 16, Unnamed: 17, SeasonYear, Yards, Formation, PlayType, IsRush, IsPass, IsIncomplete, IsTouchdown, PassType, IsSack, IsChallenge, IsChallengeReversed, Challenger, IsMeasurement, IsInterception, IsFumble, IsPenalty, IsTwoPointConversion, IsTwoPointConversionSuccessful, RushDirection, YardLineFixed, YardLineDirection, IsPenaltyAccepted, PenaltyTeam, IsNoPlay, PenaltyType, PenaltyYards'"
]
},
"metadata": {
"tags": []
},
"execution_count": 21
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "vx6M3MnYCfKe",
"colab_type": "code",
"colab": {}
},
"source": [
"rushing_data2019 = data2019\n",
"\n",
"#splitting player description on hyphen\n",
"def split_on_hyphen(x):\n",
" try:\n",
" val = x.split('-')[1]\n",
" if val.isnumeric():\n",
" return x\n",
" return val\n",
" except IndexError:\n",
" pass\n",
" return x\n",
" \n",
"#splitting on whitespace\n",
"def split_on_whitespace(x):\n",
" try:\n",
" val = x.split()[0]\n",
" return val\n",
" except IndexError:\n",
" pass\n",
" return x\n",
"\n",
"#phrases we do not want in data. Irrelevant data\n",
"bad_phrases = [\n",
" 'NO PLAY', 'REVERSED', 'POINT CONVERSION ATTEMPT', \n",
" 'EXTRA POINT', 'FIELD GOAL', 'TIMEOUT', 'TWO-MINUTE WARNING', 'END QUARTER', 'PUNT',\n",
" 'KNEELS', 'KICKS', 'END GAME'\n",
"]\n",
"\n",
"#remove play descriptions with phrases from bad_phrases list\n",
"for phrase in bad_phrases:\n",
" rushing_data2019 = rushing_data2019[~rushing_data2019['Description'].str.contains(phrase)]\n",
"\n",
"rushing_data2019['Player'] = rushing_data2019['Description'].apply(split_on_hyphen)\n",
"rushing_data2019['Player'] = rushing_data2019['Player'].apply(split_on_whitespace)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "FzIvqQzRCfKm",
"colab_type": "code",
"colab": {}
},
"source": [
"#only include a play if it is a running play and not a two point conversion\n",
"rushing_data2019 = rushing_data2019[(rushing_data2019['IsRush'] == 1) & (rushing_data2019['IsTwoPointConversion'] == 0)]\n",
"rushing_data2019['DistanceFromEndzone'] = 100 - rushing_data2019['YardLine']\n",
"rushing_data2019 = rushing_data2019[['OffenseTeam','DistanceFromEndzone', 'Player', 'Description','IsTouchdown']]"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "mN3mlJJyCfKt",
"colab_type": "code",
"colab": {},
"outputId": "217f76cd-460b-48a4-a500-f14db03dde6d"
},
"source": [
"#rename some columns\n",
"rushing_data2019.rename({\n",
" 'OffenseTeam': 'Tm',\n",
" 'IsTouchdown': 'RushingTD'\n",
"}, axis=1, inplace=True)\n",
"\n",
"\n",
"rushing_data2019.sort_values(by='DistanceFromEndzone').head(15)"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Tm</th>\n",
" <th>DistanceFromEndzone</th>\n",
" <th>Player</th>\n",
" <th>Description</th>\n",
" <th>RushingTD</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>10238</th>\n",
" <td>NE</td>\n",
" <td>1</td>\n",
" <td>T.BRADY</td>\n",
" <td>(3:53) 12-T.BRADY UP THE MIDDLE FOR 1 YARD, TO...</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5029</th>\n",
" <td>TB</td>\n",
" <td>1</td>\n",
" <td>D.OGUNBOWALE</td>\n",
" <td>(3:27) (NO HUDDLE) 44-D.OGUNBOWALE UP THE MIDD...</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13183</th>\n",
" <td>DET</td>\n",
" <td>1</td>\n",
" <td>K.JOHNSON</td>\n",
" <td>(14:12) 33-K.JOHNSON UP THE MIDDLE FOR 1 YARD,...</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13185</th>\n",
" <td>DET</td>\n",
" <td>1</td>\n",
" <td>K.JOHNSON</td>\n",
" <td>(15:00) 33-K.JOHNSON LEFT TACKLE TO PHI 1 FOR ...</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15848</th>\n",
" <td>CAR</td>\n",
" <td>1</td>\n",
" <td>D.DALEY</td>\n",
" <td>(2:00) 65-D.DALEY REPORTED IN AS ELIGIBLE. 40...</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13194</th>\n",
" <td>PHI</td>\n",
" <td>1</td>\n",
" <td>J.HOWARD</td>\n",
" <td>(6:19) 24-J.HOWARD RIGHT TACKLE FOR 1 YARD, TO...</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4858</th>\n",
" <td>ATL</td>\n",
" <td>1</td>\n",
" <td>Q.OLLISON</td>\n",
" <td>(1:27) 30-Q.OLLISON RIGHT GUARD TO TB 1 FOR NO...</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4857</th>\n",
" <td>ATL</td>\n",
" <td>1</td>\n",
" <td>Q.OLLISON</td>\n",
" <td>(:50) 30-Q.OLLISON RIGHT TACKLE FOR 1 YARD, TO...</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4781</th>\n",
" <td>LA</td>\n",
" <td>1</td>\n",
" <td>T.GURLEY</td>\n",
" <td>(3:33) 30-T.GURLEY UP THE MIDDLE FOR 1 YARD, T...</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4594</th>\n",
" <td>BAL</td>\n",
" <td>1</td>\n",
" <td>M.INGRAM</td>\n",
" <td>(4:46) 21-M.INGRAM II UP THE MIDDLE FOR 1 YARD...</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13381</th>\n",
" <td>PHI</td>\n",
" <td>1</td>\n",
" <td>H.VAITAI</td>\n",
" <td>(3:16) 72-H.VAITAI REPORTED IN AS ELIGIBLE. 1...</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>575</th>\n",
" <td>NO</td>\n",
" <td>1</td>\n",
" <td>L.MURRAY</td>\n",
" <td>(2:59) 28-L.MURRAY LEFT GUARD TO TEN 2 FOR -1 ...</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13392</th>\n",
" <td>LA</td>\n",
" <td>1</td>\n",
" <td>J.GOFF</td>\n",
" <td>(8:39) 16-J.GOFF UP THE MIDDLE FOR 1 YARD, TOU...</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4423</th>\n",
" <td>CAR</td>\n",
" <td>1</td>\n",
" <td>A.ARMAH</td>\n",
" <td>(10:55) 40-A.ARMAH RIGHT GUARD TO NO 1 FOR NO ...</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5030</th>\n",
" <td>TB</td>\n",
" <td>1</td>\n",
" <td>D.OGUNBOWALE</td>\n",
" <td>(3:47) (NO HUDDLE, SHOTGUN) 44-D.OGUNBOWALE UP...</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Tm DistanceFromEndzone Player \\\n",
"10238 NE 1 T.BRADY \n",
"5029 TB 1 D.OGUNBOWALE \n",
"13183 DET 1 K.JOHNSON \n",
"13185 DET 1 K.JOHNSON \n",
"15848 CAR 1 D.DALEY \n",
"13194 PHI 1 J.HOWARD \n",
"4858 ATL 1 Q.OLLISON \n",
"4857 ATL 1 Q.OLLISON \n",
"4781 LA 1 T.GURLEY \n",
"4594 BAL 1 M.INGRAM \n",
"13381 PHI 1 H.VAITAI \n",
"575 NO 1 L.MURRAY \n",
"13392 LA 1 J.GOFF \n",
"4423 CAR 1 A.ARMAH \n",
"5030 TB 1 D.OGUNBOWALE \n",
"\n",
" Description RushingTD \n",
"10238 (3:53) 12-T.BRADY UP THE MIDDLE FOR 1 YARD, TO... 1 \n",
"5029 (3:27) (NO HUDDLE) 44-D.OGUNBOWALE UP THE MIDD... 0 \n",
"13183 (14:12) 33-K.JOHNSON UP THE MIDDLE FOR 1 YARD,... 1 \n",
"13185 (15:00) 33-K.JOHNSON LEFT TACKLE TO PHI 1 FOR ... 0 \n",
"15848 (2:00) 65-D.DALEY REPORTED IN AS ELIGIBLE. 40... 1 \n",
"13194 (6:19) 24-J.HOWARD RIGHT TACKLE FOR 1 YARD, TO... 1 \n",
"4858 (1:27) 30-Q.OLLISON RIGHT GUARD TO TB 1 FOR NO... 0 \n",
"4857 (:50) 30-Q.OLLISON RIGHT TACKLE FOR 1 YARD, TO... 1 \n",
"4781 (3:33) 30-T.GURLEY UP THE MIDDLE FOR 1 YARD, T... 1 \n",
"4594 (4:46) 21-M.INGRAM II UP THE MIDDLE FOR 1 YARD... 1 \n",
"13381 (3:16) 72-H.VAITAI REPORTED IN AS ELIGIBLE. 1... 1 \n",
"575 (2:59) 28-L.MURRAY LEFT GUARD TO TEN 2 FOR -1 ... 0 \n",
"13392 (8:39) 16-J.GOFF UP THE MIDDLE FOR 1 YARD, TOU... 1 \n",
"4423 (10:55) 40-A.ARMAH RIGHT GUARD TO NO 1 FOR NO ... 0 \n",
"5030 (3:47) (NO HUDDLE, SHOTGUN) 44-D.OGUNBOWALE UP... 0 "
]
},
"metadata": {
"tags": []
},
"execution_count": 24
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "UBpyhs7LCfLA",
"colab_type": "text"
},
"source": [
"Now are data is pretty much formatted at this point. We've removed unneccessary columns and should have only rushing plays. We now want to merge our data with the `rushing_df` we created earlier to add in what the probability was of scoring a touchdown for each play based off the distribution we came up with previously."
]
},
{
"cell_type": "code",
"metadata": {
"id": "GE-xD5_wCfLB",
"colab_type": "code",
"colab": {}
},
"source": [
"rushing_df = pd.merge(rushing_data2019, \n",
" rushing_df, \n",
" how='inner', \n",
" left_on=['DistanceFromEndzone'], \n",
" right_on = ['DistanceFromEndzone']\n",
" )"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "M8F36yKlCfLH",
"colab_type": "code",
"colab": {},
"outputId": "f6b46c59-d132-480c-e184-e082dc27d58c"
},
"source": [
"#our final DataFrame before we calculate Expected Touchdowns\n",
"rushing_df.sort_values(by='DistanceFromEndzone', ascending=True).head(5)"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Tm</th>\n",
" <th>DistanceFromEndzone</th>\n",
" <th>Player</th>\n",
" <th>Description</th>\n",
" <th>RushingTD</th>\n",
" <th>p</th>\n",
" <th>p_smoothed</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1919</th>\n",
" <td>ATL</td>\n",
" <td>1</td>\n",
" <td>D.FREEMAN</td>\n",
" <td>(12:13) (SHOTGUN) 24-D.FREEMAN LEFT GUARD TO H...</td>\n",
" <td>0</td>\n",
" <td>0.54017</td>\n",
" <td>0.416936</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1924</th>\n",
" <td>NYG</td>\n",
" <td>1</td>\n",
" <td>W.GALLMAN</td>\n",
" <td>(12:45) 22-W.GALLMAN JR UP THE MIDDLE TO WAS 1...</td>\n",
" <td>0</td>\n",
" <td>0.54017</td>\n",
" <td>0.416936</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1925</th>\n",
" <td>JAX</td>\n",
" <td>1</td>\n",
" <td>L.FOURNETTE</td>\n",
" <td>(10:30) 27-L.FOURNETTE UP THE MIDDLE FOR 1 YAR...</td>\n",
" <td>1</td>\n",
" <td>0.54017</td>\n",
" <td>0.416936</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1926</th>\n",
" <td>KC</td>\n",
" <td>1</td>\n",
" <td>D.WILLIAMS</td>\n",
" <td>(12:12) (SHOTGUN) 31-D.WILLIAMS UP THE MIDDLE ...</td>\n",
" <td>1</td>\n",
" <td>0.54017</td>\n",
" <td>0.416936</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1927</th>\n",
" <td>CLE</td>\n",
" <td>1</td>\n",
" <td>J.MCCRAY</td>\n",
" <td>(2:17) 67-J.MCCRAY REPORTED IN AS ELIGIBLE. 2...</td>\n",
" <td>1</td>\n",
" <td>0.54017</td>\n",
" <td>0.416936</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Tm DistanceFromEndzone Player \\\n",
"1919 ATL 1 D.FREEMAN \n",
"1924 NYG 1 W.GALLMAN \n",
"1925 JAX 1 L.FOURNETTE \n",
"1926 KC 1 D.WILLIAMS \n",
"1927 CLE 1 J.MCCRAY \n",
"\n",
" Description RushingTD p \\\n",
"1919 (12:13) (SHOTGUN) 24-D.FREEMAN LEFT GUARD TO H... 0 0.54017 \n",
"1924 (12:45) 22-W.GALLMAN JR UP THE MIDDLE TO WAS 1... 0 0.54017 \n",
"1925 (10:30) 27-L.FOURNETTE UP THE MIDDLE FOR 1 YAR... 1 0.54017 \n",
"1926 (12:12) (SHOTGUN) 31-D.WILLIAMS UP THE MIDDLE ... 1 0.54017 \n",
"1927 (2:17) 67-J.MCCRAY REPORTED IN AS ELIGIBLE. 2... 1 0.54017 \n",
"\n",
" p_smoothed \n",
"1919 0.416936 \n",
"1924 0.416936 \n",
"1925 0.416936 \n",
"1926 0.416936 \n",
"1927 0.416936 "
]
},
"metadata": {
"tags": []
},
"execution_count": 26
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "9BLBo09DCfLN",
"colab_type": "code",
"colab": {}
},
"source": [
"#Group by player and team and sum the probabilities - that's our expected TD value\n",
"predicted = rushing_df.groupby(['Player','Tm'])[['p']].sum()\n",
"predicted = predicted.rename(columns={'p':'Expected Touchdowns'})\n",
"predicted.reset_index(inplace=True)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"scrolled": true,
"id": "1_Pkh_HKCfLS",
"colab_type": "code",
"colab": {},
"outputId": "299b417e-4a13-47f4-99a8-d9c7c3b93d23"
},
"source": [
"aj_rushing_expect = predicted[predicted['Player'] == 'A.JONES']['Expected Touchdowns'].values[0]\n",
"aj_rushing_expect"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"7.216190853415877"
]
},
"metadata": {
"tags": []
},
"execution_count": 28
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "p4ab8ewdCfLa",
"colab_type": "code",
"colab": {},
"outputId": "45bb07c8-4db9-42f2-a735-1d5100db2aae"
},
"source": [
"#our model is conservative. Nick Chubb underperformed.\n",
"predicted.sort_values(by='Expected Touchdowns', ascending=False).head(10)"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Player</th>\n",
" <th>Tm</th>\n",
" <th>Expected Touchdowns</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>264</th>\n",
" <td>N.CHUBB</td>\n",
" <td>CLE</td>\n",
" <td>9.912851</td>\n",
" </tr>\n",
" <tr>\n",
" <th>132</th>\n",
" <td>E.ELLIOTT</td>\n",
" <td>DAL</td>\n",
" <td>9.413015</td>\n",
" </tr>\n",
" <tr>\n",
" <th>81</th>\n",
" <td>D.COOK</td>\n",
" <td>MIN</td>\n",
" <td>9.126598</td>\n",
" </tr>\n",
" <tr>\n",
" <th>62</th>\n",
" <td>C.MCCAFFREY</td>\n",
" <td>CAR</td>\n",
" <td>9.079737</td>\n",
" </tr>\n",
" <tr>\n",
" <th>105</th>\n",
" <td>D.MONTGOMERY</td>\n",
" <td>CHI</td>\n",
" <td>7.830039</td>\n",
" </tr>\n",
" <tr>\n",
" <th>190</th>\n",
" <td>J.MIXON</td>\n",
" <td>CIN</td>\n",
" <td>7.626816</td>\n",
" </tr>\n",
" <tr>\n",
" <th>231</th>\n",
" <td>L.FOURNETTE</td>\n",
" <td>JAX</td>\n",
" <td>7.626273</td>\n",
" </tr>\n",
" <tr>\n",
" <th>330</th>\n",
" <td>T.GURLEY</td>\n",
" <td>LA</td>\n",
" <td>7.580702</td>\n",
" </tr>\n",
" <tr>\n",
" <th>246</th>\n",
" <td>M.INGRAM</td>\n",
" <td>BAL</td>\n",
" <td>7.423987</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>A.JONES</td>\n",
" <td>GB</td>\n",
" <td>7.216191</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Player Tm Expected Touchdowns\n",
"264 N.CHUBB CLE 9.912851\n",
"132 E.ELLIOTT DAL 9.413015\n",
"81 D.COOK MIN 9.126598\n",
"62 C.MCCAFFREY CAR 9.079737\n",
"105 D.MONTGOMERY CHI 7.830039\n",
"190 J.MIXON CIN 7.626816\n",
"231 L.FOURNETTE JAX 7.626273\n",
"330 T.GURLEY LA 7.580702\n",
"246 M.INGRAM BAL 7.423987\n",
"16 A.JONES GB 7.216191"
]
},
"metadata": {
"tags": []
},
"execution_count": 29
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "6hz8AC0JCfLg",
"colab_type": "code",
"colab": {},
"outputId": "38814d7b-dacd-49ef-e372-8fa542989af4"
},
"source": [
"aj_rushing_actual = aj['RushingTD'].sum()\n",
"\n",
"diff = abs(aj_rushing_actual - aj_rushing_expect)\n",
"\n",
"if aj_rushing_expect > aj_rushing_actual:\n",
" print('Aaron Jones underperformed in Rushing TDs by {} TDs'.format(diff))\n",
"else:\n",
" print('Aaron Jones overperformed in Rushing TDs by {} TDs'.format(diff))"
],
"execution_count": 0,
"outputs": [
{
"output_type": "stream",
"text": [
"Aaron Jones overperformed in Rushing TDs by 8.783809146584122 TDs\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "KqebvA4ZCfLm",
"colab_type": "text"
},
"source": [
"As you can see, Aaron Jones **overperformed in rushing TDs** for the 2019 season based off the probability distribution we calculated. Given Aaron Jones' quality and quantitiy of usage throughout the 2019 season, he would have been expected to score around 3 touchdowns on the season. If you owned Aaron Jones, this is not surprising. The fact that the gap is so large though should raise cause for concern and he may be due for a negative regression next season based off these numbers. Let's run (basically) the same analysis for receiving TDs."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "40J_5fHUCfLn",
"colab_type": "text"
},
"source": [
"As you can see, formatting descriptions to names for receivers was a bit more tedious."
]
},
{
"cell_type": "code",
"metadata": {
"id": "lWzo_B2ECfLo",
"colab_type": "code",
"colab": {}
},
"source": [
"passing_data2019 = data2019\n",
"\n",
"def get_receiver(x):\n",
" try:\n",
" val = x.split(' TO ')[1]\n",
" return val\n",
" except IndexError:\n",
" pass\n",
" return x\n",
"\n",
"def remove_run_plays(x):\n",
" if len(x) < 2:\n",
" return np.nan\n",
" return x\n",
"\n",
"def remove_dot(x):\n",
" try:\n",
" if '.' in x:\n",
" if len(x.split('.')) > 2:\n",
" val = '.'.join(x.split('.')[:2])\n",
" return val\n",
" except TypeError:\n",
" pass\n",
" return x\n",
"\n",
"def remove_special(x, char):\n",
" try:\n",
" if char in x:\n",
" val = x.split(char)[0]\n",
" return val\n",
" except TypeError:\n",
" pass\n",
" return x\n",
"\n",
"filters = [get_receiver, \n",
" split_on_hyphen,\n",
" split_on_whitespace, \n",
" remove_run_plays,\n",
" remove_dot,\n",
" lambda x: remove_special(x, ')'),\n",
" lambda x: remove_special(x, ';'),\n",
" lambda x: remove_special(x, ',')]\n",
"\n",
"bad_phrases = bad_phrases + ['SACKED', 'INTERCEPTED']\n",
"\n",
"for phrase in bad_phrases:\n",
" passing_data2019 = passing_data2019[~passing_data2019['Description'].str.contains(phrase)]\n",
" \n",
"passing_data2019['Player'] = passing_data2019['Description']\n",
"\n",
"for filt_func in filters: \n",
" passing_data2019['Player'] = passing_data2019['Player'].apply(filt_func)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "eJa-K71mCfLt",
"colab_type": "code",
"colab": {}
},
"source": [
"passing_data2019 = passing_data2019[(passing_data2019['IsPass'] == 1) & (passing_data2019['IsTwoPointConversion'] == 0)]\n",
"passing_data2019['DistanceFromEndzone'] = 100 - passing_data2019['YardLine']\n",
"passing_data2019 = passing_data2019[['OffenseTeam', 'Player', 'DistanceFromEndzone', 'Description','IsTouchdown', 'IsPass', 'IsTwoPointConversion']]\n",
"passing_data2019.rename({\n",
" 'OffenseTeam': 'Tm',\n",
" 'IsTouchdown': 'ReceivingTD'\n",
"}, axis=1, inplace=True)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "G2xsgWNTCfLx",
"colab_type": "code",
"colab": {}
},
"source": [
"passing_df = pd.merge(passing_data2019, \n",
" passing_df, \n",
" how='inner', \n",
" left_on=['DistanceFromEndzone'], \n",
" right_on = ['DistanceFromEndzone']\n",
" )"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "po2D-xHJCfL2",
"colab_type": "code",
"colab": {}
},
"source": [
"predicted = passing_df.groupby(['Player','Tm'])[['p']].sum()\n",
"\n",
"predicted = predicted.rename(columns={'p':'Expected Touchdowns'})\n",
"\n",
"predicted.reset_index(inplace=True)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "eWLJdVj1CfL6",
"colab_type": "code",
"colab": {},
"outputId": "a572c72e-4cfa-4054-cbe4-1dfb115bdc4e"
},
"source": [
"aj_receiving_expect = predicted[predicted['Player'] == 'A.JONES']['Expected Touchdowns'].values[0]\n",
"aj_receiving_expect"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"2.987485448911446"
]
},
"metadata": {
"tags": []
},
"execution_count": 39
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "m2VSUUY3CfL-",
"colab_type": "code",
"colab": {},
"outputId": "dd020f75-3705-469e-e97a-2a4070a9dd31"
},
"source": [
"aj_receiving_actual = aj['ReceivingTD'].sum()\n",
"\n",
"diff = abs(aj_receiving_actual - aj_receiving_expect)\n",
"\n",
"if aj_receiving_expect > aj_receiving_actual:\n",
" print('Aaron Jones underperformed in Receiving TDs by {} TDs'.format(diff))\n",
"else:\n",
" print('Aaron Jones overperformed in Receiving TDs by {} TDs'.format(diff))"
],
"execution_count": 0,
"outputs": [
{
"output_type": "stream",
"text": [
"Aaron Jones overperformed in Receiving TDs by 0.012514551088553816 TDs\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "NLe8Pn9tCfMF",
"colab_type": "text"
},
"source": [
"Not as bad for receiving TDs, it seems as though Aaron Jones caught as many receiving TDs as expected. This doesn't make up for the gap in rushing TDs, however."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "QM4QsDjkCfMF",
"colab_type": "text"
},
"source": [
"## Conclusions"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "98TkaR5mCfMG",
"colab_type": "text"
},
"source": [
"1. Aaron Jones had a stellar season and ranked second in RBs for PPR, but that doesn't tell the whole picture. Henry posted almost the same amount of Fantasy Points per game with a lower standard deviation. Points are the name of the game but don't tell the whole picture. You also have to look at how consistently a player is getting you points. Aaron Jones finished second on the season for ALL players in terms of standard deviation. Not a good look if you're looking for consistency.\n",
"2. We found that there was no statistically significant relationship between Jamaal Williams' usage and Aaron Jones' production. Our p-value was close though, and there did seem to be *some* correlation, so take that with a grain of salt.\n",
"3. Lastly, based off the probability distribution we made for both rushing and receiving TDs, it looks like Aaron Jones **really** overperformed in terms of rushing output, and did about just as expected for receiving. One caveat here is that our model was fairly conservative, but Jones still finished 10th in TDs in our model. Don't be surprised to see a negative regression in rushing TDs for Aaron Jones come the 2020 season."
]
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment