903124/EPA_calculation.ipynb

## EPA_calculation.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# How to create EPA rating by yourself\n",
    "\n",
    "By using nfldb that imports NFL data from NFL.com's JSON, one can easily calculate estimated points added (EPA) ranking, which provide a good estimation of strength of NFL team."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First [Nfldb](https://github.com/BurntSushi/nfldb) has to be installed to the computer. [Windows installation guidelines](https://github.com/BurntSushi/nfldb/wiki/Detailed-Windows-PostgreSQL-installation)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The first step is to read NFL regular season data from 15-17"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "import csv\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import nfldb\n",
    "import math\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "EPA_observed = np.zeros(100)\n",
    "EPA_play = np.zeros(100)\n",
    "\n",
    "\n",
    "\n",
    "db = nfldb.connect()\n",
    "q = nfldb.Query(db) \n",
    "games = q.game(season_year=[2017], season_type='Regular').as_games()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Second, read the outcome of all nfl drives than adding 7 points for each touchdown, 3 points for each field goal and -2 points for each safety in each of 1st and 10 situation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "for index, game in enumerate(games):\n",
    "    for drive in game.drives:\n",
    "                \n",
    "            if(drive.result == 'Field Goal')or (drive.result == 'Punt') or (drive.result == 'Touchdown') or (drive.result == 'Missed FG') or (drive.result == 'Safety') or (drive.result == 'Fumble, Safety') or (drive.result == 'Interception') or (drive.result == 'Fumble'):\n",
    "\n",
    "\n",
    "             for play in drive.plays:\n",
    "                    if ((play.passing_att == 1) or (play.rushing_att == 1) or (play.passing_sk == 1)) and (int(play.down) == 1):\n",
    "\n",
    "\n",
    "                        yard_str = str(play.yardline)    #tidy up the data\n",
    "                        yard_split_str = yard_str.split()\n",
    "                        pos_indicate = yard_split_str[0]\n",
    "\n",
    "                        if str(game.home_team) == str(play.pos_team):\n",
    "                            opp_team = str(game.away_team)\n",
    "                        else:\n",
    "                            opp_team = str(game.home_team)\n",
    "\n",
    "\n",
    "                        if pos_indicate == 'OWN':\n",
    "                            yardlinefromstr = int(yard_split_str[1])\n",
    "                        elif pos_indicate == 'OPP':\n",
    "                            yardlinefromstr = 100 - int(yard_split_str[1])\n",
    "                        else:\n",
    "                            yardlinefromstr = 50\n",
    "               \n",
    "                        end_field_str = str(drive.end_field) \n",
    "                        end_field_split_str = end_field_str.split()\n",
    "                        #print(end_field_split_str)\n",
    "                        end_field_pos_indicate = end_field_split_str[0]\n",
    "\n",
    "                        if end_field_pos_indicate == 'OWN':\n",
    "                            end_field_fromstr = int(end_field_split_str[1])\n",
    "                        elif end_field_pos_indicate == 'OPP':\n",
    "                            end_field_fromstr = 100 - int(end_field_split_str[1])\n",
    "                        else:\n",
    "                            end_field_fromstr = 50    \n",
    "\n",
    "                        if(int(play.down) == 1):   \n",
    "                            if(drive.result == 'Field Goal'):\n",
    "                                EPA_play[yardlinefromstr-1] += 1\n",
    "                                EPA_observed[yardlinefromstr-1] += 3\n",
    "                            if (drive.result == 'Missed FG'):\n",
    "                                EPA_play[yardlinefromstr-1] += 1\n",
    "                            if (drive.result == 'Interception'):\n",
    "                                EPA_play[yardlinefromstr-1] += 1\n",
    "                            if (drive.result == 'Fumble'):\n",
    "                                EPA_play[yardlinefromstr-1] += 1\n",
    "                            if(drive.result == 'Punt'):\n",
    "                                \n",
    "                                EPA_play[yardlinefromstr-1] += 1\n",
    "                            if(drive.result == 'Safety') or (drive.result == 'Fumble, Safety'):\n",
    "                                EPA_play[yardlinefromstr-1] += 1\n",
    "                                EPA_observed[yardlinefromstr-1] -= 2\n",
    "                            if(drive.result == 'Touchdown'):\n",
    "                                EPA_play[yardlinefromstr-1] += 1\n",
    "                                EPA_observed[yardlinefromstr-1] += 7"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we have estimated points for every yard line. Since we have to calculate the points deducted for each punt, we use the estimated points on the above calculation and deduct points according to opponent's estimated starting field position. Here we repeat the calculation few time in order to make the result converge."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "for i in range(4):  \n",
    "    \n",
    "    for j in range(99):\n",
    "        if(EPA_play[j] > 0):\n",
    "            EPA_observed[j] = float(EPA_observed[j]) / float(EPA_play[j])\n",
    "        \n",
    "        \n",
    "    temp_EPA = EPA_observed\n",
    "    EPA_observed = np.zeros(100)\n",
    "    EPA_play = np.zeros(100)\n",
    "\n",
    "   \n",
    "\n",
    "    for game in games:\n",
    "        for drive in game.drives:\n",
    "            \n",
    "            if(drive.result == 'Field Goal')or (drive.result == 'Punt') or (drive.result == 'Touchdown') or (drive.result == 'Missed FG') or (drive.result == 'Safety') or (drive.result == 'Fumble, Safety') or (drive.result == 'Interception') or (drive.result == 'Fumble'):\n",
    "\n",
    "                 for play in drive.plays:\n",
    "                    if ((play.passing_att == 1) or (play.rushing_att == 1) or (play.passing_sk == 1)) and (int(play.down) == 1):\n",
    "                            yard_str = str(play.yardline)    #tidy up the data\n",
    "                            yard_split_str = yard_str.split()\n",
    "                            pos_indicate = yard_split_str[0]\n",
    "\n",
    "                            if str(game.home_team) == str(play.pos_team):\n",
    "                                opp_team = str(game.away_team)\n",
    "                            else:\n",
    "                                opp_team = str(game.home_team)\n",
    "\n",
    "\n",
    "                            if pos_indicate == 'OWN':\n",
    "                                yardlinefromstr = int(yard_split_str[1])\n",
    "                            elif pos_indicate == 'OPP':\n",
    "                                yardlinefromstr = 100 - int(yard_split_str[1])\n",
    "                            else:\n",
    "                                yardlinefromstr = 50\n",
    "                            end_field_str = str(drive.end_field) \n",
    "                            end_field_split_str = end_field_str.split()\n",
    "                            #print(end_field_split_str)\n",
    "                            end_field_pos_indicate = end_field_split_str[0]\n",
    "\n",
    "                            if end_field_pos_indicate == 'OWN':\n",
    "                                end_field_fromstr = int(end_field_split_str[1])\n",
    "                            elif end_field_pos_indicate == 'OPP':\n",
    "                                end_field_fromstr = 100 - int(end_field_split_str[1])\n",
    "                            else:\n",
    "                                end_field_fromstr = 50                             \n",
    "\n",
    "                            yardline =  yardlinefromstr\n",
    "\n",
    "                            if(drive.result == 'Field Goal'):    \n",
    "                                EPA_play[yardlinefromstr-1] += 1\n",
    "                                EPA_observed[yardlinefromstr-1] += 3\n",
    "                            if (drive.result == 'Missed FG'):\n",
    "                                EPA_play[yardlinefromstr-1] += 1\n",
    "                            if (drive.result == 'Interception'):\n",
    "                                EPA_play[yardlinefromstr-1] += 1\n",
    "\n",
    "\n",
    "                                EPA_observed[yardlinefromstr-1]-= temp_EPA[100-end_field_fromstr]\n",
    "                            if (drive.result == 'Fumble'):\n",
    "                                EPA_play[yardlinefromstr-1] += 1\n",
    "                                EPA_observed[yardlinefromstr-1]-= temp_EPA[100-end_field_fromstr]\n",
    "                            if(drive.result == 'Punt'):\n",
    "                                EPA_play[yardlinefromstr-1] += 1\n",
    "\n",
    "                                #deducting the value from each punt\n",
    "                                EPA_observed[yardlinefromstr-1] -= temp_EPA[100 - int(-0.0116 * end_field_fromstr * end_field_fromstr + 1.5343 * end_field_fromstr + 37.91) - 1]\n",
    "                            if(drive.result == 'Safety') or (drive.result == 'Fumble, Safety'):\n",
    "                                EPA_play[yardlinefromstr-1] += 1\n",
    "                                EPA_observed[yardlinefromstr-1] -= 2\n",
    "                            if(drive.result == 'Touchdown'):\n",
    "                                EPA_play[yardlinefromstr-1] += 1\n",
    "                                EPA_observed[yardlinefromstr-1] += 7\n",
    "                                \n",
    "for i in range(99):\n",
    "    if(EPA_play[i] > 0):\n",
    "        EPA_observed[i] = float(EPA_observed[i]) / float(EPA_play[i])\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After obtaining the observed estimated points value, we smooth the curve for more accurate vaule since it contains error due to random nature of the game"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "cof = np.polyfit(np.linspace(1,99,num=99),EPA_observed[0:99],5)\n",
    "x = np.linspace(1,99,num=99)\n",
    "EPA_observed_smooth = cof[0]*x**5 + cof[1]*x**4 + cof[2]*x**3 + cof[3]*x**2 + cof[4]*x + cof[5]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By calculating the change of estimated points for every drives (hence estimated points added), EPA of each teams can be obtained."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "team_list = np.array(['ARI','ATL','BAL','BUF','CAR','CHI','CIN','CLE','DAL','DEN','DET','GB','HOU','IND','JAX','KC','LAC','LA','MIA','MIN','NE','NO'\n",
    "                 ,'NYG','NYJ','OAK','PHI','PIT','SEA','SF','TB','TEN','WAS'])\n",
    "# team_list = np.array(['ARI','ATL','BAL','BUF','CAR','CHI','CIN','CLE','DAL','DEN','DET','GB','HOU','IND','JAX','KC','LA','MIA','MIN','NE','NO'\n",
    "#                  ,'NYG','NYJ','OAK','PHI','PIT','SD','SEA','SF','TB','TEN','WAS'])\n",
    "            #2016 'STR' -> 'LA'\n",
    "            #2017 'SD' -> 'LAC'\n",
    "\n",
    "EPA_team = np.zeros(32)\n",
    "O_EPA = np.zeros(32)\n",
    "D_EPA = np.zeros(32)\n",
    "game_count = np.zeros(32)\n",
    "next_drive_start_yardline = 50 #initialize only\n",
    "\n",
    "\n",
    "db = nfldb.connect()\n",
    "q = nfldb.Query(db)\n",
    "games = q.game(season_year=2018, season_type='Regular').as_games()\n",
    "\n",
    "epa_result = np.empty(0)\n",
    "nfl_week = 0\n",
    "for game in games:\n",
    "    if(game.week > nfl_week and game.finished == 1):\n",
    "        nfl_week += 1\n",
    "    if game.finished == 1:\n",
    "        game_count[np.where(team_list==game.home_team)[0][0]] += 1\n",
    "        game_count[np.where(team_list==game.away_team)[0][0]] += 1\n",
    "    for drive in game.drives:\n",
    "        \n",
    "        yard_str = str(drive.start_field)     #tidy up the data\n",
    "        yard_split_str = yard_str.split()\n",
    "        pos_indicate = yard_split_str[0]\n",
    "\n",
    "        if str(game.home_team) == str(drive.pos_team):\n",
    "            opp_team = str(game.away_team)\n",
    "        else:\n",
    "            opp_team = str(game.home_team)\n",
    "\n",
    "        if pos_indicate == 'OWN':\n",
    "            yardlinefromstr = int(yard_split_str[1])\n",
    "        elif pos_indicate == 'OPP':\n",
    "            yardlinefromstr = 100 - int(yard_split_str[1])\n",
    "        else:\n",
    "            yardlinefromstr = 50\n",
    "\n",
    "\n",
    "        end_field_str = str(drive.end_field) \n",
    "        end_field_split_str = end_field_str.split()\n",
    "\n",
    "        end_field_pos_indicate = end_field_split_str[0]\n",
    "\n",
    "        if end_field_pos_indicate == 'OWN':\n",
    "            end_field_fromstr = int(end_field_split_str[1])\n",
    "        elif end_field_pos_indicate == 'OPP':\n",
    "            end_field_fromstr = 100 - int(end_field_split_str[1])\n",
    "        else:\n",
    "            end_field_fromstr = 50    \n",
    "        EP_start = EPA_observed_smooth[yardlinefromstr-1]\n",
    "\n",
    "        \n",
    "        if( (drive.result =='End of Game') or (drive.result == 'End of Half')):\n",
    "            EP_end =  EP_start\n",
    "\n",
    " \n",
    "        if( (drive.result =='Missed FG') or (drive.result =='Interception') or (drive.result =='Fumble') or (drive.result =='Downs') or (drive.result =='Blocked FG') or (drive.result =='Blocked Punt') or (drive.result =='Blocked FG, Downs') or (drive.result =='Blocked Punt, Downs')):\n",
    "            EP_end = -EPA_observed_smooth[100-end_field_fromstr-1]\n",
    "            \n",
    "\n",
    "        if(drive.result == 'Punt'):\n",
    "            EP_end = -EPA_observed_smooth[next_drive_start_yardline-1]\n",
    "\n",
    "            \n",
    "\n",
    "        if(drive.result == 'Touchdown'):\n",
    "            EP_end = 7\n",
    "\n",
    "            \n",
    "\n",
    "        if(drive.result == 'Field Goal'):   \n",
    "            EP_end = 3\n",
    "\n",
    "            \n",
    "            \n",
    "        if((drive.result == 'Fumble, Safety') or (drive.result == 'Safety')):\n",
    "            EP_end = -2 - EPA_observed_smooth[next_drive_start_yardline-1]\n",
    "\n",
    "        epa_result = np.append(epa_result,(EP_end-EP_start))\n",
    "\n",
    "        EPA_team[np.where(team_list==drive.pos_team)[0][0]] += ( EP_end - EP_start ) \n",
    "        \n",
    "        O_EPA[np.where(team_list==drive.pos_team)[0][0]] += ( EP_end - EP_start ) \n",
    "        \n",
    "        EPA_team[np.where(team_list==opp_team)[0][0]] -= ( EP_end - EP_start )    \n",
    "\n",
    "        D_EPA[np.where(team_list==opp_team)[0][0]] -= ( EP_end - EP_start ) \n",
    "\n",
    "        next_drive_start_yardline = yardlinefromstr\n",
    "    \n",
    "\n",
    "O_EPA_mean = np.mean(O_EPA)\n",
    "D_EPA_mean = np.mean(D_EPA)\n",
    "\n",
    "for i in range(32):\n",
    "    O_EPA[i]-= O_EPA_mean\n",
    "    D_EPA[i]-= D_EPA_mean"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally by calculating each team's win probability using logistic regression, the estimated wins for each NFL teams in 2016 season can be obtained."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "\n",
    "EPA_per_16_games = np.zeros(32)\n",
    "\n",
    "Estimated_wins = np.zeros(32)\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "for i in range(32):\n",
    "      \n",
    "    EPA_per_16_games[i] = EPA_team[i] / game_count[i] *16 \n",
    "    Estimated_wins[i] = 16*(1/(1+(2.7128**(-(0.007849*EPA_per_16_games[i])))))\n",
    "    \n",
    "std_EPA =  np.std(EPA_per_16_games)\n",
    "    \n",
    "for i in range(32):\n",
    "    EPA_per_16_games[i] = EPA_per_16_games[i] * 80 / std_EPA\n",
    "    \n",
    "output_df = pd.DataFrame({'Team name': team_list, 'EPA per 16 games': EPA_per_16_games, 'Estimated EPA wins': Estimated_wins, \"Offensive EPA\": O_EPA, \"Defensive EPA\": D_EPA})  \n",
    "fig, ax = plt.subplots()\n",
    "ax.scatter(O_EPA, D_EPA)\n",
    "plt.xlabel('Offensive EPA')\n",
    "plt.ylabel('Defensive EPA')\n",
    "plt.title('NFL EPA week %d' % nfl_week)\n",
    "\n",
    "for i, txt in enumerate(team_list):\n",
    "    ax.annotate(txt, (O_EPA[i], D_EPA[i]))\n",
    "\n",
    "output_df = output_df.sort_values('EPA per 16 games')\n",
    "output_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "(Optional) adding image"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "from matplotlib.offsetbox import AnnotationBbox, OffsetImage\n",
    "from matplotlib._png import read_png\n",
    "import csv\n",
    "import urllib2\n",
    "import cStringIO\n",
    "from PIL import Image\n",
    "\n",
    "\n",
    "fig = plt.gcf()\n",
    "fig.clf()\n",
    "ax = plt.subplot(111)\n",
    "\n",
    "\n",
    "url = 'https://raw.githubusercontent.com/statsbylopez/BlogPosts/master/nfl_teamlogos.csv'\n",
    "response = urllib2.urlopen(url)\n",
    "cr = csv.reader(response)\n",
    "\n",
    "for i,row in enumerate(cr):\n",
    "    if(i != 0):\n",
    "        \n",
    "        file = cStringIO.StringIO(urllib2.urlopen(row[2]).read())\n",
    "\n",
    "        img = Image.open(file)\n",
    "\n",
    "        imagebox = OffsetImage(img, zoom=1)\n",
    "        xy = [O_EPA[i-1],D_EPA[i-1]]              \n",
    "\n",
    "\n",
    "        ab = AnnotationBbox(imagebox, xy,\n",
    "            xybox=(-0,0),\n",
    "            xycoords='data',\n",
    "            boxcoords=\"offset points\",\n",
    "            frameon=False)                                  \n",
    "        ax.add_artist(ab)\n",
    "\n",
    "\n",
    "ax.grid(True)\n",
    "\n",
    "plt.xlim([-200,200])\n",
    "plt.ylim([-200,200])\n",
    "plt.xlabel('Offensive EPA')\n",
    "plt.ylabel('Defensive EPA')\n",
    "plt.title('NFL EPA week %d' % nfl_week)\n",
    "plt.savefig('EPA_plot.png',dpi=400)\n"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python [Root]",
   "language": "python",
   "name": "Python [Root]"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"# How to create EPA rating by yourself\n",
	"\n",
	"By using nfldb that imports NFL data from NFL.com's JSON, one can easily calculate estimated points added (EPA) ranking, which provide a good estimation of strength of NFL team."
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"First [Nfldb](https://github.com/BurntSushi/nfldb) has to be installed to the computer. [Windows installation guidelines](https://github.com/BurntSushi/nfldb/wiki/Detailed-Windows-PostgreSQL-installation)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"The first step is to read NFL regular season data from 15-17"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": false
	},
	"outputs": [],
	"source": [
	"import csv\n",
	"import pandas as pd\n",
	"import numpy as np\n",
	"import nfldb\n",
	"import math\n",
	"import matplotlib.pyplot as plt\n",
	"\n",
	"EPA_observed = np.zeros(100)\n",
	"EPA_play = np.zeros(100)\n",
	"\n",
	"\n",
	"\n",
	"db = nfldb.connect()\n",
	"q = nfldb.Query(db) \n",
	"games = q.game(season_year=[2017], season_type='Regular').as_games()"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Second, read the outcome of all nfl drives than adding 7 points for each touchdown, 3 points for each field goal and -2 points for each safety in each of 1st and 10 situation."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"for index, game in enumerate(games):\n",
	" for drive in game.drives:\n",
	" \n",
	" if(drive.result == 'Field Goal')or (drive.result == 'Punt') or (drive.result == 'Touchdown') or (drive.result == 'Missed FG') or (drive.result == 'Safety') or (drive.result == 'Fumble, Safety') or (drive.result == 'Interception') or (drive.result == 'Fumble'):\n",
	"\n",
	"\n",
	" for play in drive.plays:\n",
	" if ((play.passing_att == 1) or (play.rushing_att == 1) or (play.passing_sk == 1)) and (int(play.down) == 1):\n",
	"\n",
	"\n",
	" yard_str = str(play.yardline) #tidy up the data\n",
	" yard_split_str = yard_str.split()\n",
	" pos_indicate = yard_split_str[0]\n",
	"\n",
	" if str(game.home_team) == str(play.pos_team):\n",
	" opp_team = str(game.away_team)\n",
	" else:\n",
	" opp_team = str(game.home_team)\n",
	"\n",
	"\n",
	" if pos_indicate == 'OWN':\n",
	" yardlinefromstr = int(yard_split_str[1])\n",
	" elif pos_indicate == 'OPP':\n",
	" yardlinefromstr = 100 - int(yard_split_str[1])\n",
	" else:\n",
	" yardlinefromstr = 50\n",
	" \n",
	" end_field_str = str(drive.end_field) \n",
	" end_field_split_str = end_field_str.split()\n",
	" #print(end_field_split_str)\n",
	" end_field_pos_indicate = end_field_split_str[0]\n",
	"\n",
	" if end_field_pos_indicate == 'OWN':\n",
	" end_field_fromstr = int(end_field_split_str[1])\n",
	" elif end_field_pos_indicate == 'OPP':\n",
	" end_field_fromstr = 100 - int(end_field_split_str[1])\n",
	" else:\n",
	" end_field_fromstr = 50 \n",
	"\n",
	" if(int(play.down) == 1): \n",
	" if(drive.result == 'Field Goal'):\n",
	" EPA_play[yardlinefromstr-1] += 1\n",
	" EPA_observed[yardlinefromstr-1] += 3\n",
	" if (drive.result == 'Missed FG'):\n",
	" EPA_play[yardlinefromstr-1] += 1\n",
	" if (drive.result == 'Interception'):\n",
	" EPA_play[yardlinefromstr-1] += 1\n",
	" if (drive.result == 'Fumble'):\n",
	" EPA_play[yardlinefromstr-1] += 1\n",
	" if(drive.result == 'Punt'):\n",
	" \n",
	" EPA_play[yardlinefromstr-1] += 1\n",
	" if(drive.result == 'Safety') or (drive.result == 'Fumble, Safety'):\n",
	" EPA_play[yardlinefromstr-1] += 1\n",
	" EPA_observed[yardlinefromstr-1] -= 2\n",
	" if(drive.result == 'Touchdown'):\n",
	" EPA_play[yardlinefromstr-1] += 1\n",
	" EPA_observed[yardlinefromstr-1] += 7"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Now we have estimated points for every yard line. Since we have to calculate the points deducted for each punt, we use the estimated points on the above calculation and deduct points according to opponent's estimated starting field position. Here we repeat the calculation few time in order to make the result converge."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"for i in range(4): \n",
	" \n",
	" for j in range(99):\n",
	" if(EPA_play[j] > 0):\n",
	" EPA_observed[j] = float(EPA_observed[j]) / float(EPA_play[j])\n",
	" \n",
	" \n",
	" temp_EPA = EPA_observed\n",
	" EPA_observed = np.zeros(100)\n",
	" EPA_play = np.zeros(100)\n",
	"\n",
	" \n",
	"\n",
	" for game in games:\n",
	" for drive in game.drives:\n",
	" \n",
	" if(drive.result == 'Field Goal')or (drive.result == 'Punt') or (drive.result == 'Touchdown') or (drive.result == 'Missed FG') or (drive.result == 'Safety') or (drive.result == 'Fumble, Safety') or (drive.result == 'Interception') or (drive.result == 'Fumble'):\n",
	"\n",
	" for play in drive.plays:\n",
	" if ((play.passing_att == 1) or (play.rushing_att == 1) or (play.passing_sk == 1)) and (int(play.down) == 1):\n",
	" yard_str = str(play.yardline) #tidy up the data\n",
	" yard_split_str = yard_str.split()\n",
	" pos_indicate = yard_split_str[0]\n",
	"\n",
	" if str(game.home_team) == str(play.pos_team):\n",
	" opp_team = str(game.away_team)\n",
	" else:\n",
	" opp_team = str(game.home_team)\n",
	"\n",
	"\n",
	" if pos_indicate == 'OWN':\n",
	" yardlinefromstr = int(yard_split_str[1])\n",
	" elif pos_indicate == 'OPP':\n",
	" yardlinefromstr = 100 - int(yard_split_str[1])\n",
	" else:\n",
	" yardlinefromstr = 50\n",
	" end_field_str = str(drive.end_field) \n",
	" end_field_split_str = end_field_str.split()\n",
	" #print(end_field_split_str)\n",
	" end_field_pos_indicate = end_field_split_str[0]\n",
	"\n",
	" if end_field_pos_indicate == 'OWN':\n",
	" end_field_fromstr = int(end_field_split_str[1])\n",
	" elif end_field_pos_indicate == 'OPP':\n",
	" end_field_fromstr = 100 - int(end_field_split_str[1])\n",
	" else:\n",
	" end_field_fromstr = 50 \n",
	"\n",
	" yardline = yardlinefromstr\n",
	"\n",
	" if(drive.result == 'Field Goal'): \n",
	" EPA_play[yardlinefromstr-1] += 1\n",
	" EPA_observed[yardlinefromstr-1] += 3\n",
	" if (drive.result == 'Missed FG'):\n",
	" EPA_play[yardlinefromstr-1] += 1\n",
	" if (drive.result == 'Interception'):\n",
	" EPA_play[yardlinefromstr-1] += 1\n",
	"\n",
	"\n",
	" EPA_observed[yardlinefromstr-1]-= temp_EPA[100-end_field_fromstr]\n",
	" if (drive.result == 'Fumble'):\n",
	" EPA_play[yardlinefromstr-1] += 1\n",
	" EPA_observed[yardlinefromstr-1]-= temp_EPA[100-end_field_fromstr]\n",
	" if(drive.result == 'Punt'):\n",
	" EPA_play[yardlinefromstr-1] += 1\n",
	"\n",
	" #deducting the value from each punt\n",
	" EPA_observed[yardlinefromstr-1] -= temp_EPA[100 - int(-0.0116 * end_field_fromstr * end_field_fromstr + 1.5343 * end_field_fromstr + 37.91) - 1]\n",
	" if(drive.result == 'Safety') or (drive.result == 'Fumble, Safety'):\n",
	" EPA_play[yardlinefromstr-1] += 1\n",
	" EPA_observed[yardlinefromstr-1] -= 2\n",
	" if(drive.result == 'Touchdown'):\n",
	" EPA_play[yardlinefromstr-1] += 1\n",
	" EPA_observed[yardlinefromstr-1] += 7\n",
	" \n",
	"for i in range(99):\n",
	" if(EPA_play[i] > 0):\n",
	" EPA_observed[i] = float(EPA_observed[i]) / float(EPA_play[i])\n",
	"\n"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"After obtaining the observed estimated points value, we smooth the curve for more accurate vaule since it contains error due to random nature of the game"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": false
	},
	"outputs": [],
	"source": [
	"cof = np.polyfit(np.linspace(1,99,num=99),EPA_observed[0:99],5)\n",
	"x = np.linspace(1,99,num=99)\n",
	"EPA_observed_smooth = cof[0]x5 + cof[1]x*4 + cof[2]x*3 + cof[3]x*2 + cof[4]x + cof[5]"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"By calculating the change of estimated points for every drives (hence estimated points added), EPA of each teams can be obtained."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": false,
	"scrolled": true
	},
	"outputs": [],
	"source": [
	"team_list = np.array(['ARI','ATL','BAL','BUF','CAR','CHI','CIN','CLE','DAL','DEN','DET','GB','HOU','IND','JAX','KC','LAC','LA','MIA','MIN','NE','NO'\n",
	" ,'NYG','NYJ','OAK','PHI','PIT','SEA','SF','TB','TEN','WAS'])\n",
	"# team_list = np.array(['ARI','ATL','BAL','BUF','CAR','CHI','CIN','CLE','DAL','DEN','DET','GB','HOU','IND','JAX','KC','LA','MIA','MIN','NE','NO'\n",
	"# ,'NYG','NYJ','OAK','PHI','PIT','SD','SEA','SF','TB','TEN','WAS'])\n",
	" #2016 'STR' -> 'LA'\n",
	" #2017 'SD' -> 'LAC'\n",
	"\n",
	"EPA_team = np.zeros(32)\n",
	"O_EPA = np.zeros(32)\n",
	"D_EPA = np.zeros(32)\n",
	"game_count = np.zeros(32)\n",
	"next_drive_start_yardline = 50 #initialize only\n",
	"\n",
	"\n",
	"db = nfldb.connect()\n",
	"q = nfldb.Query(db)\n",
	"games = q.game(season_year=2018, season_type='Regular').as_games()\n",
	"\n",
	"epa_result = np.empty(0)\n",
	"nfl_week = 0\n",
	"for game in games:\n",
	" if(game.week > nfl_week and game.finished == 1):\n",
	" nfl_week += 1\n",
	" if game.finished == 1:\n",
	" game_count[np.where(team_list==game.home_team)[0][0]] += 1\n",
	" game_count[np.where(team_list==game.away_team)[0][0]] += 1\n",
	" for drive in game.drives:\n",
	" \n",
	" yard_str = str(drive.start_field) #tidy up the data\n",
	" yard_split_str = yard_str.split()\n",
	" pos_indicate = yard_split_str[0]\n",
	"\n",
	" if str(game.home_team) == str(drive.pos_team):\n",
	" opp_team = str(game.away_team)\n",
	" else:\n",
	" opp_team = str(game.home_team)\n",
	"\n",
	" if pos_indicate == 'OWN':\n",
	" yardlinefromstr = int(yard_split_str[1])\n",
	" elif pos_indicate == 'OPP':\n",
	" yardlinefromstr = 100 - int(yard_split_str[1])\n",
	" else:\n",
	" yardlinefromstr = 50\n",
	"\n",
	"\n",
	" end_field_str = str(drive.end_field) \n",
	" end_field_split_str = end_field_str.split()\n",
	"\n",
	" end_field_pos_indicate = end_field_split_str[0]\n",
	"\n",
	" if end_field_pos_indicate == 'OWN':\n",
	" end_field_fromstr = int(end_field_split_str[1])\n",
	" elif end_field_pos_indicate == 'OPP':\n",
	" end_field_fromstr = 100 - int(end_field_split_str[1])\n",
	" else:\n",
	" end_field_fromstr = 50 \n",
	" EP_start = EPA_observed_smooth[yardlinefromstr-1]\n",
	"\n",
	" \n",
	" if( (drive.result =='End of Game') or (drive.result == 'End of Half')):\n",
	" EP_end = EP_start\n",
	"\n",
	" \n",
	" if( (drive.result =='Missed FG') or (drive.result =='Interception') or (drive.result =='Fumble') or (drive.result =='Downs') or (drive.result =='Blocked FG') or (drive.result =='Blocked Punt') or (drive.result =='Blocked FG, Downs') or (drive.result =='Blocked Punt, Downs')):\n",
	" EP_end = -EPA_observed_smooth[100-end_field_fromstr-1]\n",
	" \n",
	"\n",
	" if(drive.result == 'Punt'):\n",
	" EP_end = -EPA_observed_smooth[next_drive_start_yardline-1]\n",
	"\n",
	" \n",
	"\n",
	" if(drive.result == 'Touchdown'):\n",
	" EP_end = 7\n",
	"\n",
	" \n",
	"\n",
	" if(drive.result == 'Field Goal'): \n",
	" EP_end = 3\n",
	"\n",
	" \n",
	" \n",
	" if((drive.result == 'Fumble, Safety') or (drive.result == 'Safety')):\n",
	" EP_end = -2 - EPA_observed_smooth[next_drive_start_yardline-1]\n",
	"\n",
	" epa_result = np.append(epa_result,(EP_end-EP_start))\n",
	"\n",
	" EPA_team[np.where(team_list==drive.pos_team)[0][0]] += ( EP_end - EP_start ) \n",
	" \n",
	" O_EPA[np.where(team_list==drive.pos_team)[0][0]] += ( EP_end - EP_start ) \n",
	" \n",
	" EPA_team[np.where(team_list==opp_team)[0][0]] -= ( EP_end - EP_start ) \n",
	"\n",
	" D_EPA[np.where(team_list==opp_team)[0][0]] -= ( EP_end - EP_start ) \n",
	"\n",
	" next_drive_start_yardline = yardlinefromstr\n",
	" \n",
	"\n",
	"O_EPA_mean = np.mean(O_EPA)\n",
	"D_EPA_mean = np.mean(D_EPA)\n",
	"\n",
	"for i in range(32):\n",
	" O_EPA[i]-= O_EPA_mean\n",
	" D_EPA[i]-= D_EPA_mean"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Finally by calculating each team's win probability using logistic regression, the estimated wins for each NFL teams in 2016 season can be obtained."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": false,
	"scrolled": false
	},
	"outputs": [],
	"source": [
	"%matplotlib inline\n",
	"\n",
	"EPA_per_16_games = np.zeros(32)\n",
	"\n",
	"Estimated_wins = np.zeros(32)\n",
	"\n",
	"\n",
	"\n",
	"\n",
	"for i in range(32):\n",
	" \n",
	" EPA_per_16_games[i] = EPA_team[i] / game_count[i] *16 \n",
	" Estimated_wins[i] = 16(1/(1+(2.7128(-(0.007849EPA_per_16_games[i])))))\n",
	" \n",
	"std_EPA = np.std(EPA_per_16_games)\n",
	" \n",
	"for i in range(32):\n",
	" EPA_per_16_games[i] = EPA_per_16_games[i] * 80 / std_EPA\n",
	" \n",
	"output_df = pd.DataFrame({'Team name': team_list, 'EPA per 16 games': EPA_per_16_games, 'Estimated EPA wins': Estimated_wins, \"Offensive EPA\": O_EPA, \"Defensive EPA\": D_EPA}) \n",
	"fig, ax = plt.subplots()\n",
	"ax.scatter(O_EPA, D_EPA)\n",
	"plt.xlabel('Offensive EPA')\n",
	"plt.ylabel('Defensive EPA')\n",
	"plt.title('NFL EPA week %d' % nfl_week)\n",
	"\n",
	"for i, txt in enumerate(team_list):\n",
	" ax.annotate(txt, (O_EPA[i], D_EPA[i]))\n",
	"\n",
	"output_df = output_df.sort_values('EPA per 16 games')\n",
	"output_df"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"(Optional) adding image"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": false,
	"scrolled": false
	},
	"outputs": [],
	"source": [
	"import matplotlib.pyplot as plt\n",
	"from matplotlib.offsetbox import AnnotationBbox, OffsetImage\n",
	"from matplotlib._png import read_png\n",
	"import csv\n",
	"import urllib2\n",
	"import cStringIO\n",
	"from PIL import Image\n",
	"\n",
	"\n",
	"fig = plt.gcf()\n",
	"fig.clf()\n",
	"ax = plt.subplot(111)\n",
	"\n",
	"\n",
	"url = 'https://raw.githubusercontent.com/statsbylopez/BlogPosts/master/nfl_teamlogos.csv'\n",
	"response = urllib2.urlopen(url)\n",
	"cr = csv.reader(response)\n",
	"\n",
	"for i,row in enumerate(cr):\n",
	" if(i != 0):\n",
	" \n",
	" file = cStringIO.StringIO(urllib2.urlopen(row[2]).read())\n",
	"\n",
	" img = Image.open(file)\n",
	"\n",
	" imagebox = OffsetImage(img, zoom=1)\n",
	" xy = [O_EPA[i-1],D_EPA[i-1]] \n",
	"\n",
	"\n",
	" ab = AnnotationBbox(imagebox, xy,\n",
	" xybox=(-0,0),\n",
	" xycoords='data',\n",
	" boxcoords=\"offset points\",\n",
	" frameon=False) \n",
	" ax.add_artist(ab)\n",
	"\n",
	"\n",
	"ax.grid(True)\n",
	"\n",
	"plt.xlim([-200,200])\n",
	"plt.ylim([-200,200])\n",
	"plt.xlabel('Offensive EPA')\n",
	"plt.ylabel('Defensive EPA')\n",
	"plt.title('NFL EPA week %d' % nfl_week)\n",
	"plt.savefig('EPA_plot.png',dpi=400)\n"
	]
	}
	],
	"metadata": {
	"anaconda-cloud": {},
	"kernelspec": {
	"display_name": "Python [Root]",
	"language": "python",
	"name": "Python [Root]"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 2
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython2",
	"version": "2.7.12"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 1
	}