franloza/ATP.ipynb

## ATP.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Preprocessing dataset: ATP Tennis Rankings, Results, and Stats\n",
    "## Source: https://github.com/JeffSackmann/tennis_atp\n",
    "### Predictive modeling. Master in Big Data Analysis. 2018/2019\n",
    "### Authors: Francisco J. Lozano, Antonio Miranda, Diego Suárez"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tourney_id                2016-M020\n",
       "tourney_name               Brisbane\n",
       "surface                        Hard\n",
       "draw_size                        32\n",
       "tourney_level                     A\n",
       "tourney_date            2.01601e+07\n",
       "match_num                       300\n",
       "winner_id                    105683\n",
       "winner_seed                       4\n",
       "winner_entry                    NaN\n",
       "winner_name            Milos Raonic\n",
       "winner_hand                       R\n",
       "winner_ht                       196\n",
       "winner_ioc                      CAN\n",
       "winner_age                  25.0212\n",
       "winner_rank                      14\n",
       "winner_rank_points             2170\n",
       "loser_id                     103819\n",
       "loser_seed                        1\n",
       "loser_entry                     NaN\n",
       "loser_name            Roger Federer\n",
       "loser_hand                        R\n",
       "loser_ht                        185\n",
       "loser_ioc                       SUI\n",
       "loser_age                   34.4066\n",
       "loser_rank                        3\n",
       "loser_rank_points              8265\n",
       "score                       6-4 6-4\n",
       "best_of                           3\n",
       "round                             F\n",
       "minutes                          87\n",
       "w_ace                             6\n",
       "w_df                              6\n",
       "w_svpt                           60\n",
       "w_1stIn                          34\n",
       "w_1stWon                         28\n",
       "w_2ndWon                         14\n",
       "w_SvGms                          10\n",
       "w_bpSaved                         1\n",
       "w_bpFaced                         1\n",
       "l_ace                             7\n",
       "l_df                              3\n",
       "l_svpt                           61\n",
       "l_1stIn                          34\n",
       "l_1stWon                         25\n",
       "l_2ndWon                         14\n",
       "l_SvGms                          10\n",
       "l_bpSaved                         3\n",
       "l_bpFaced                         5\n",
       "Name: 0, dtype: object"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "import warnings\n",
    "from tqdm import tqdm\n",
    "warnings.filterwarnings('ignore')\n",
    "data = pd.concat([pd.read_csv(\"data/atp_matches_2016.csv\"),pd.read_csv(\"data/atp_matches_2017.csv\")])\n",
    "data.iloc[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- tourney_id. A character id that uniquely identifies each tournament\n",
    "- tourney_name. A character tournament name\n",
    "- surface. A character description of the court surface (Carpet, Clay, Grass, or Hard)\n",
    "- draw_size. A numeric value indicating the draw size\n",
    "- tourney_level. A character description of the tournament level (A, C, D, F, G, M)\n",
    "- match_num. A numeric indicating the order of matches\n",
    "- winner_id. A numeric id identifying the player who won the match\n",
    "- winner_seed. A numeric value for the winner's seeding\n",
    "- winner_entry. A character value indicating the winner's entry type (WC = Wild card, Q = Qualifier, LL = Lucky loser, or PR = Protected ranking)\n",
    "- winner_name. A character of the winner's name\n",
    "- winner_hand. A character value indicated the handedness of the winner\n",
    "- winner_ht. A numeric value of the winner's height in cm\n",
    "- winner_ioc. A character of the winner's country of origin\n",
    "- winner_age. A numeric of the winner's age at the time of the match\n",
    "- winner_rank. A numeric of the winner's rank at the time of the match\n",
    "- winner_rank_points. A numeric of the winner's 52-week ranking points at the time of the match\n",
    "- loser_id. A numeric id identifying the player who won the match\n",
    "- loser_seed. A numeric value for the loser's seeding\n",
    "- loser_entry. A character value indicating the loser's entry type (WC = Wild card, Q = Qualifier, LL = Lucky loser, or PR = Protected ranking)\n",
    "- loser_name. A character of the loser's name\n",
    "- loser_hand. A character value indicated the handedness of the loser\n",
    "- loser_ht. A numeric value of the loser's height in cm\n",
    "- loser_ioc. A character of the loser's country of origin\n",
    "- loser_age. A numeric of the loser's age at the time of the match\n",
    "- loser_rank. A numeric of the loser's rank at the time of the match\n",
    "- loser_rank_points. A numeric of the loser's 52-week ranking points at the time of the match\n",
    "- score. A character of the match score\n",
    "- best_of. A numeric value indicating the match format (3 or 5)\n",
    "- round. A character indicating the round of the match\n",
    "- minutes. A numeric value for the duration of the match in minutes\n",
    "- w_ace. A numeric value for the winner's number of aces\n",
    "- w_df. A numeric value for the winner's number of double faults\n",
    "- w_svpt. A numeric value for the winner's number of service points\n",
    "- w_1stIn. A numeric value for the winner's number of first serves in\n",
    "- w_1stWon. A numeric value for the winner's number of first service points won\n",
    "- w_2ndWon. A numeric value for the winner's number of second service points won\n",
    "- w_SvGms. A numeric value for the winner's number of service games\n",
    "- w_bpSaved. A numeric value for the winner's number of breakpoints saves\n",
    "- w_bpFaced. A numeric value for the winner's number of breakpoints faced\n",
    "- l_ace. A numeric value for the loser's number of aces\n",
    "- l_df. A numeric value for the loser's number of double faults\n",
    "- l_svpt. A numeric value for the loser's number of service points\n",
    "- l_1stIn. A numeric value for the loser's number of first serves in\n",
    "- l_1stWon. A numeric value for the loser's number of first service points won\n",
    "- l_2ndWon. A numeric value for the loser's number of second service points won\n",
    "- l_SvGms. A numeric value for the loser's number of service games\n",
    "- l_bpSaved. A numeric value for the loser's number of breakpoints saves\n",
    "- l_bpFaced. A numeric value for the loser's number of breakpoints faced"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Number of rows: 5890'"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "\"Number of rows: \" + str(data.shape[0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tourney_id              63\n",
       "tourney_name            63\n",
       "surface                 63\n",
       "draw_size               63\n",
       "tourney_level           63\n",
       "tourney_date            63\n",
       "match_num               63\n",
       "winner_id               63\n",
       "winner_seed           3309\n",
       "winner_entry          5224\n",
       "winner_name             63\n",
       "winner_hand             67\n",
       "winner_ht             1422\n",
       "winner_ioc              63\n",
       "winner_age              71\n",
       "winner_rank             96\n",
       "winner_rank_points      96\n",
       "loser_id                63\n",
       "loser_seed            4476\n",
       "loser_entry           4802\n",
       "loser_name              63\n",
       "loser_hand              80\n",
       "loser_ht              1912\n",
       "loser_ioc               63\n",
       "loser_age               78\n",
       "loser_rank             147\n",
       "loser_rank_points      147\n",
       "score                   63\n",
       "best_of                 63\n",
       "round                   63\n",
       "minutes                135\n",
       "w_ace                  120\n",
       "w_df                   120\n",
       "w_svpt                 120\n",
       "w_1stIn                120\n",
       "w_1stWon               120\n",
       "w_2ndWon               120\n",
       "w_SvGms                120\n",
       "w_bpSaved              120\n",
       "w_bpFaced              120\n",
       "l_ace                  120\n",
       "l_df                   120\n",
       "l_svpt                 120\n",
       "l_1stIn                120\n",
       "l_1stWon               120\n",
       "l_2ndWon               120\n",
       "l_SvGms                120\n",
       "l_bpSaved              120\n",
       "l_bpFaced              120\n",
       "dtype: int64"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.isnull().sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "new_data = data.drop([\"winner_seed\", \"winner_entry\", \"winner_ht\",\n",
    "                      \"loser_seed\", \"loser_entry\", \"loser_ht\"], axis=1).dropna()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Number of rows: 5606'"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "\"Number of rows: \" + str(new_data.shape[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Firstly, we need to get rid of information of winner and losers from columns as set it as a new column (label). To do it, we are going to set as player 1, the one as higher ranking, and the second player as the one with lower ranking"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'tourney_id, tourney_name, surface, drap1_size, tourney_level, tourney_date, match_num, p1_id, p1_name, p1_hand, p1_ioc, p1_age, p1_rank, p1_rank_points, p2_id, p2_name, p2_hand, p2_ioc, p2_age, p2_rank, p2_rank_points, score, best_of, round, minutes, p1_ace, p1_df, p1_svpt, p1_1stIn, p1_1stWon, p1_2ndWon, p1_SvGms, p1_bpSaved, p1_bpFaced, p2_ace, p2_df, p2_svpt, p2_1stIn, p2_1stWon, p2_2ndWon, p2_SvGms, p2_bpSaved, p2_bpFaced, p1_win'"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "new_data[\"p1_win\"] = True\n",
    "new_columns = [col.replace(\"winner_\",\"p1_\").replace(\"w_\",\"p1_\").replace(\"loser_\",\"p2_\").replace(\"l_\",\"p2_\")\n",
    "               for col in new_data.columns]\n",
    "new_data.columns = new_columns\n",
    "\", \". join(new_columns)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [],
   "source": [
    "p1_stats_columns = [\"p1_id\", \"p1_name\", \"p1_hand\", \"p1_ioc\", \"p1_age\", \"p1_rank\", \"p1_rank_points\",\n",
    "                    \"p1_ace\", \"p1_df\", \"p1_svpt\", \"p1_1stIn\", \"p1_1stWon\", \"p1_2ndWon\", \"p1_SvGms\",\n",
    "                    \"p1_bpSaved\", \"p1_bpFaced\"]\n",
    "p2_stats_columns = [\"p2_id\", \"p2_name\", \"p2_hand\", \"p2_ioc\", \"p2_age\", \"p2_rank\", \"p2_rank_points\",\n",
    "                    \"p2_ace\", \"p2_df\", \"p2_svpt\", \"p2_1stIn\", \"p2_1stWon\", \"p2_2ndWon\", \"p2_SvGms\",\n",
    "                    \"p2_bpSaved\", \"p2_bpFaced\"]\n",
    "\n",
    "for idx, match in new_data.iterrows():\n",
    "    if match[\"p1_rank\"] > match[\"p2_rank\"]:\n",
    "        #Swap player\n",
    "        new_data.loc[idx, \"p1_win\"] = False\n",
    "        p1_stats = new_data.loc[idx, p1_stats_columns]\n",
    "        new_data.loc[idx, p1_stats_columns] = new_data.loc[idx, p2_stats_columns].values\n",
    "        new_data.loc[idx, p2_stats_columns] = p1_stats.values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>tourney_id</th>\n",
       "      <th>tourney_name</th>\n",
       "      <th>surface</th>\n",
       "      <th>drap1_size</th>\n",
       "      <th>tourney_level</th>\n",
       "      <th>tourney_date</th>\n",
       "      <th>match_num</th>\n",
       "      <th>p1_id</th>\n",
       "      <th>p1_name</th>\n",
       "      <th>p1_hand</th>\n",
       "      <th>...</th>\n",
       "      <th>p2_ace</th>\n",
       "      <th>p2_df</th>\n",
       "      <th>p2_svpt</th>\n",
       "      <th>p2_1stIn</th>\n",
       "      <th>p2_1stWon</th>\n",
       "      <th>p2_2ndWon</th>\n",
       "      <th>p2_SvGms</th>\n",
       "      <th>p2_bpSaved</th>\n",
       "      <th>p2_bpFaced</th>\n",
       "      <th>p1_win</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2016-M020</td>\n",
       "      <td>Brisbane</td>\n",
       "      <td>Hard</td>\n",
       "      <td>32.0</td>\n",
       "      <td>A</td>\n",
       "      <td>20160104.0</td>\n",
       "      <td>300.0</td>\n",
       "      <td>105683.0</td>\n",
       "      <td>Milos Raonic</td>\n",
       "      <td>R</td>\n",
       "      <td>...</td>\n",
       "      <td>7.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>61.0</td>\n",
       "      <td>34.0</td>\n",
       "      <td>25.0</td>\n",
       "      <td>14.0</td>\n",
       "      <td>10.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2017-M020</td>\n",
       "      <td>Brisbane</td>\n",
       "      <td>Hard</td>\n",
       "      <td>32.0</td>\n",
       "      <td>A</td>\n",
       "      <td>20170102.0</td>\n",
       "      <td>300.0</td>\n",
       "      <td>105777.0</td>\n",
       "      <td>Grigor Dimitrov</td>\n",
       "      <td>R</td>\n",
       "      <td>...</td>\n",
       "      <td>4.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>69.0</td>\n",
       "      <td>49.0</td>\n",
       "      <td>36.0</td>\n",
       "      <td>9.0</td>\n",
       "      <td>12.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>2 rows × 44 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "  tourney_id tourney_name surface  drap1_size tourney_level  tourney_date  \\\n",
       "0  2016-M020     Brisbane    Hard        32.0             A    20160104.0   \n",
       "0  2017-M020     Brisbane    Hard        32.0             A    20170102.0   \n",
       "\n",
       "   match_num     p1_id          p1_name p1_hand   ...   p2_ace  p2_df  \\\n",
       "0      300.0  105683.0     Milos Raonic       R   ...      7.0    3.0   \n",
       "0      300.0  105777.0  Grigor Dimitrov       R   ...      4.0    0.0   \n",
       "\n",
       "   p2_svpt  p2_1stIn  p2_1stWon p2_2ndWon p2_SvGms p2_bpSaved  p2_bpFaced  \\\n",
       "0     61.0      34.0       25.0      14.0     10.0        3.0         5.0   \n",
       "0     69.0      49.0       36.0       9.0     12.0        2.0         5.0   \n",
       "\n",
       "   p1_win  \n",
       "0   False  \n",
       "0   False  \n",
       "\n",
       "[2 rows x 44 columns]"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "new_data.loc[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We add a new label for creating a regression problem: Difference in points"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [],
   "source": [
    "new_data.loc[:, \"diff_points\"] =\\\n",
    "    abs((new_data[\"p1_1stWon\"] + new_data[\"p1_2ndWon\"]) - (new_data[\"p2_1stWon\"] + new_data[\"p1_2ndWon\"]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, we sort the dataset by date yo have our stats dataset ready to explore. We will get 5-matches and 20-matches\n",
    "rolling statistics for each player to construct the final dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [],
   "source": [
    "new_data = new_data.sort_values(by=[\"tourney_date\", \"match_num\"], ascending=True).reset_index(drop=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>tourney_name</th>\n",
       "      <th>tourney_date</th>\n",
       "      <th>match_num</th>\n",
       "      <th>p1_name</th>\n",
       "      <th>p2_name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Doha</td>\n",
       "      <td>20160104.0</td>\n",
       "      <td>270.0</td>\n",
       "      <td>Rafael Nadal</td>\n",
       "      <td>Pablo Carreno Busta</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Brisbane</td>\n",
       "      <td>20160104.0</td>\n",
       "      <td>271.0</td>\n",
       "      <td>Denis Istomin</td>\n",
       "      <td>Mikhail Kukushkin</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Chennai</td>\n",
       "      <td>20160104.0</td>\n",
       "      <td>271.0</td>\n",
       "      <td>Ramkumar Ramanathan</td>\n",
       "      <td>Daniel Gimeno Traver</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Doha</td>\n",
       "      <td>20160104.0</td>\n",
       "      <td>271.0</td>\n",
       "      <td>Aslan Karatsev</td>\n",
       "      <td>Robin Haase</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Brisbane</td>\n",
       "      <td>20160104.0</td>\n",
       "      <td>272.0</td>\n",
       "      <td>Dusan Lajovic</td>\n",
       "      <td>Radek Stepanek</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  tourney_name  tourney_date  match_num              p1_name  \\\n",
       "0         Doha    20160104.0      270.0         Rafael Nadal   \n",
       "1     Brisbane    20160104.0      271.0        Denis Istomin   \n",
       "2      Chennai    20160104.0      271.0  Ramkumar Ramanathan   \n",
       "3         Doha    20160104.0      271.0       Aslan Karatsev   \n",
       "4     Brisbane    20160104.0      272.0        Dusan Lajovic   \n",
       "\n",
       "                p2_name  \n",
       "0   Pablo Carreno Busta  \n",
       "1     Mikhail Kukushkin  \n",
       "2  Daniel Gimeno Traver  \n",
       "3           Robin Haase  \n",
       "4        Radek Stepanek  "
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "new_data[[\"tourney_name\", \"tourney_date\", \"match_num\", \"p1_name\", \"p2_name\"]].head(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'tourney_id, tourney_name, surface, drap1_size, tourney_level, tourney_date, match_num, p1_id, p1_name, p1_hand, p1_ioc, p1_age, p1_rank, p1_rank_points, p2_id, p2_name, p2_hand, p2_ioc, p2_age, p2_rank, p2_rank_points, score, best_of, round, minutes, p1_ace, p1_df, p1_svpt, p1_1stIn, p1_1stWon, p1_2ndWon, p1_SvGms, p1_bpSaved, p1_bpFaced, p2_ace, p2_df, p2_svpt, p2_1stIn, p2_1stWon, p2_2ndWon, p2_SvGms, p2_bpSaved, p2_bpFaced, p1_win, diff_points'"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "\", \".join(new_data.columns)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [],
   "source": [
    "# A priori stats\n",
    "a_priori_columns = ['tourney_id', 'tourney_name', 'surface', 'drap1_size', 'tourney_level', 'tourney_date', 'match_num',\n",
    "                    'p1_id', 'p1_name', 'p1_hand', 'p1_ioc', 'p1_age', 'p1_rank', 'p1_rank_points',\n",
    "                    'p2_id', 'p2_name', 'p2_hand', 'p2_ioc', 'p2_age', 'p2_rank', 'p2_rank_points']\n",
    "final_dataset = new_data[a_priori_columns].copy()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_players_stats(new_data, match_id, window_sizes):\n",
    "    stats = {}\n",
    "    match = new_data.loc[match_id]\n",
    "    \n",
    "    # Get windows for player 1 and player 2 (unbounded yet)\n",
    "    mask_p1 = (new_data.index < match_id) & ((new_data.p1_id == match.p1_id) | (new_data.p2_id == match.p1_id))\n",
    "    p1_window = new_data[mask_p1]\n",
    "    p1_window.loc[:, \"p2_win\"] = ~p1_window.loc[:, \"p1_win\"].values\n",
    "    mask_p2 = (new_data.index < match_id) & ((new_data.p1_id == match.p2_id) | (new_data.p2_id == match.p2_id))\n",
    "    p2_window = new_data[mask_p2]\n",
    "    p2_window.loc[:, \"p2_win\"] = ~p2_window.loc[:, \"p1_win\"].values\n",
    "    \n",
    "    # Set stats for windows\n",
    "    stats_columns = [\"id\", \"name\", \"hand\", \"ioc\", \"age\", \"rank\", \"rank_points\",\n",
    "                     \"ace\", \"df\", \"svpt\", \"1stIn\", \"1stWon\", \"2ndWon\", \"SvGms\",\n",
    "                     \"bpSaved\", \"bpFaced\",\"win\"]\n",
    "    p1_window_stats = pd.DataFrame(index=p1_window.index, columns=stats_columns)\n",
    "    p1_window_stats.loc[p1_window[match.p1_id == p1_window.p1_id].index,:]= \\\n",
    "        p1_window.loc[match.p1_id == p1_window.p1_id, map(lambda x: \"p1_\"+x, stats_columns)].values\n",
    "    p1_window_stats.loc[p1_window[match.p1_id == p1_window.p2_id].index,:]= \\\n",
    "        p1_window.loc[match.p1_id == p1_window.p2_id, map(lambda x: \"p2_\"+x, stats_columns)].values\n",
    "    \n",
    "    p2_window_stats = pd.DataFrame(index=p2_window.index, columns=stats_columns)\n",
    "    p2_window_stats.loc[p2_window[match.p2_id == p2_window.p1_id].index,:]= \\\n",
    "        p2_window.loc[match.p2_id == p2_window.p1_id, map(lambda x: \"p1_\"+x, stats_columns)].values\n",
    "    p2_window_stats.loc[p2_window[match.p2_id == p2_window.p2_id].index,:]= \\\n",
    "        p2_window.loc[match.p2_id == p2_window.p2_id, map(lambda x: \"p2_\"+x, stats_columns)].values\n",
    "    \n",
    "    for window_size in window_sizes:\n",
    "        # Stats for player 1\n",
    "        p1_last_matches = p1_window_stats.tail(window_size)\n",
    "        if p1_last_matches.empty:\n",
    "            stats[\"p1_win_prob_{}w\".format(window_size)] = np.nan\n",
    "            stats[\"p1_ace_prob_{}w\".format(window_size)] = np.nan\n",
    "            stats[\"p1_df_prob_{}w\".format(window_size)] = np.nan\n",
    "            stats[\"p1_svptWon_prob_{}w\".format(window_size)] = np.nan\n",
    "            #stats[\"p1_bpSaved_prob_{}w\".format(window_size)] = np.nan\n",
    "        else:\n",
    "            # Get Percetage of matches won in last windows_size matches\n",
    "            stats[\"p1_win_prob_{}w\".format(window_size)] = p1_last_matches.win.sum() / p1_last_matches.shape[0]\n",
    "            # Get percentage of aces/point served in last windows_size matches (aces / svpt)\n",
    "            stats[\"p1_ace_prob_{}w\".format(window_size)] = p1_last_matches.ace.sum() / p1_last_matches.svpt.sum()\n",
    "            # Get percentage of double faults/point served in last windows size matches (df / svpt)\n",
    "            stats[\"p1_df_prob_{}w\".format(window_size)] = p1_last_matches.df.sum() / p1_last_matches.svpt.sum()\n",
    "            # Get percentage of points won/point served in last windows size matches ((1stWon + 2ndWon) / svpt)\n",
    "            stats[\"p1_svptWon_prob_{}w\".format(window_size)] = \\\n",
    "                (p1_last_matches[\"1stWon\"] + p1_last_matches[\"2ndWon\"]).sum() / p1_last_matches.svpt.sum()\n",
    "            # Get percentage of breakpoint saved (bpSaved / bpFaced)\n",
    "            #stats[\"p1_bpSaved_prob_{}w\".format(window_size)] = p1_last_matches.bpSaved.sum() / p1_last_matches.bpFaced.sum()\n",
    "\n",
    "        # Stats for player 2\n",
    "        p2_last_matches = p2_window_stats.tail(window_size)\n",
    "        if p2_last_matches.empty:\n",
    "            stats[\"p2_win_prob_{}w\".format(window_size)] = np.nan\n",
    "            stats[\"p2_ace_prob_{}w\".format(window_size)] = np.nan\n",
    "            stats[\"p2_df_prob_{}w\".format(window_size)] = np.nan\n",
    "            stats[\"p2_svptWon_prob_{}w\".format(window_size)] = np.nan\n",
    "            #stats[\"p2_bpSaved_prob_{}w\".format(window_size)] = np.nan\n",
    "        else:\n",
    "            # Get Percetage of matches won in last windows_size matches\n",
    "            stats[\"p2_win_prob_{}w\".format(window_size)] = p2_last_matches.win.sum() / p2_last_matches.shape[0]\n",
    "            # Get percentage of aces/point served in last windows_size matches (aces / svpt)\n",
    "            stats[\"p2_ace_prob_{}w\".format(window_size)] = p2_last_matches.ace.sum() / p2_last_matches.svpt.sum()\n",
    "            # Get percentage of double faults/point served in last windows size matches (df / svpt)\n",
    "            stats[\"p2_df_prob_{}w\".format(window_size)] = p2_last_matches.df.sum() / p2_last_matches.svpt.sum()\n",
    "            # Get percentage of points won/point served in last windows size matches ((1stWon + 2ndWon) / svpt)\n",
    "            stats[\"p2_svptWon_prob_{}w\".format(window_size)] = \\\n",
    "                (p2_last_matches[\"1stWon\"] + p2_last_matches[\"2ndWon\"]).sum() / p2_last_matches.svpt.sum()\n",
    "            # Get percentage of breakpoint saved (bpSaved / bpFaced)\n",
    "            #stats[\"p2_bpSaved_prob_{}w\".format(window_size)] = p2_last_matches.bpSaved.sum() / p2_last_matches.bpFaced.sum()\n",
    "\n",
    "        # Get Percentage of matches won in surface in last windows_size matches (played in that surface)\n",
    "        p1_surface_matches = p1_window_stats.loc[new_data.loc[p1_window_stats.index, \"surface\"] == match.surface].tail(window_size)\n",
    "        if p1_surface_matches.empty:\n",
    "            stats[\"p1_surface_win_prob_{}w\".format(window_size)] = pd.np.nan\n",
    "        else:\n",
    "            stats[\"p1_surface_win_prob_{}w\".format(window_size)] = \\\n",
    "                p1_surface_matches.win.sum() / p1_surface_matches.shape[0]\n",
    "        p2_surface_matches = p2_window_stats.loc[new_data.loc[p2_window_stats.index, \"surface\"] == match.surface].tail(window_size)\n",
    "        if p2_surface_matches.empty:\n",
    "            stats[\"p2_surface_win_prob_{}w\".format(window_size)] = pd.np.nan\n",
    "        else:\n",
    "            stats[\"p2_surface_win_prob_{}w\".format(window_size)] = \\\n",
    "                p2_surface_matches.win.sum() / p2_surface_matches.shape[0]\n",
    "\n",
    "    return pd.Series(stats)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "p1_win_prob_20w            0.850000\n",
       "p1_ace_prob_20w            0.126904\n",
       "p1_df_prob_20w             0.038434\n",
       "p1_svptWon_prob_20w        0.689630\n",
       "p2_win_prob_20w            0.700000\n",
       "p2_ace_prob_20w            0.066336\n",
       "p2_df_prob_20w             0.024179\n",
       "p2_svptWon_prob_20w        0.642901\n",
       "p1_surface_win_prob_20w    0.850000\n",
       "p2_surface_win_prob_20w    0.700000\n",
       "dtype: float64"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "match_id = 2760-1\n",
    "get_players_stats(new_data, match_id, [20])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "5606it [3:16:46,  1.48it/s] \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>p1_win_prob_20w</th>\n",
       "      <th>p1_ace_prob_20w</th>\n",
       "      <th>p1_df_prob_20w</th>\n",
       "      <th>p1_svptWon_prob_20w</th>\n",
       "      <th>p2_win_prob_20w</th>\n",
       "      <th>p2_ace_prob_20w</th>\n",
       "      <th>p2_df_prob_20w</th>\n",
       "      <th>p2_svptWon_prob_20w</th>\n",
       "      <th>p1_surface_win_prob_20w</th>\n",
       "      <th>p2_surface_win_prob_20w</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5576</th>\n",
       "      <td>0.500000</td>\n",
       "      <td>0.123810</td>\n",
       "      <td>0.041905</td>\n",
       "      <td>0.654603</td>\n",
       "      <td>0.800000</td>\n",
       "      <td>0.106942</td>\n",
       "      <td>0.052533</td>\n",
       "      <td>0.653533</td>\n",
       "      <td>0.450000</td>\n",
       "      <td>0.80</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5577</th>\n",
       "      <td>0.700000</td>\n",
       "      <td>0.032805</td>\n",
       "      <td>0.016402</td>\n",
       "      <td>0.610169</td>\n",
       "      <td>0.450000</td>\n",
       "      <td>0.082393</td>\n",
       "      <td>0.030474</td>\n",
       "      <td>0.645598</td>\n",
       "      <td>0.700000</td>\n",
       "      <td>0.55</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5578</th>\n",
       "      <td>0.125000</td>\n",
       "      <td>0.082852</td>\n",
       "      <td>0.030829</td>\n",
       "      <td>0.595376</td>\n",
       "      <td>0.750000</td>\n",
       "      <td>0.168580</td>\n",
       "      <td>0.045921</td>\n",
       "      <td>0.672508</td>\n",
       "      <td>0.200000</td>\n",
       "      <td>0.75</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5579</th>\n",
       "      <td>0.550000</td>\n",
       "      <td>0.061159</td>\n",
       "      <td>0.025751</td>\n",
       "      <td>0.634657</td>\n",
       "      <td>0.550000</td>\n",
       "      <td>0.052599</td>\n",
       "      <td>0.048215</td>\n",
       "      <td>0.631183</td>\n",
       "      <td>0.550000</td>\n",
       "      <td>0.40</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5580</th>\n",
       "      <td>0.350000</td>\n",
       "      <td>0.050960</td>\n",
       "      <td>0.050960</td>\n",
       "      <td>0.614163</td>\n",
       "      <td>0.900000</td>\n",
       "      <td>0.060876</td>\n",
       "      <td>0.024624</td>\n",
       "      <td>0.715458</td>\n",
       "      <td>0.300000</td>\n",
       "      <td>0.90</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5581</th>\n",
       "      <td>0.600000</td>\n",
       "      <td>0.106911</td>\n",
       "      <td>0.050756</td>\n",
       "      <td>0.667387</td>\n",
       "      <td>0.700000</td>\n",
       "      <td>0.089120</td>\n",
       "      <td>0.053241</td>\n",
       "      <td>0.646412</td>\n",
       "      <td>0.600000</td>\n",
       "      <td>0.65</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5582</th>\n",
       "      <td>0.450000</td>\n",
       "      <td>0.064821</td>\n",
       "      <td>0.041536</td>\n",
       "      <td>0.660793</td>\n",
       "      <td>0.600000</td>\n",
       "      <td>0.079812</td>\n",
       "      <td>0.039645</td>\n",
       "      <td>0.661450</td>\n",
       "      <td>0.450000</td>\n",
       "      <td>0.55</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5583</th>\n",
       "      <td>0.850000</td>\n",
       "      <td>0.151185</td>\n",
       "      <td>0.039718</td>\n",
       "      <td>0.678411</td>\n",
       "      <td>0.750000</td>\n",
       "      <td>0.038462</td>\n",
       "      <td>0.031805</td>\n",
       "      <td>0.661243</td>\n",
       "      <td>0.750000</td>\n",
       "      <td>0.75</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5584</th>\n",
       "      <td>0.450000</td>\n",
       "      <td>0.092008</td>\n",
       "      <td>0.056414</td>\n",
       "      <td>0.642713</td>\n",
       "      <td>0.900000</td>\n",
       "      <td>0.095176</td>\n",
       "      <td>0.029987</td>\n",
       "      <td>0.679270</td>\n",
       "      <td>0.450000</td>\n",
       "      <td>0.85</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5585</th>\n",
       "      <td>0.700000</td>\n",
       "      <td>0.238825</td>\n",
       "      <td>0.033206</td>\n",
       "      <td>0.712005</td>\n",
       "      <td>0.800000</td>\n",
       "      <td>0.116556</td>\n",
       "      <td>0.050331</td>\n",
       "      <td>0.691391</td>\n",
       "      <td>0.700000</td>\n",
       "      <td>0.80</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5586</th>\n",
       "      <td>0.750000</td>\n",
       "      <td>0.106184</td>\n",
       "      <td>0.021237</td>\n",
       "      <td>0.675203</td>\n",
       "      <td>0.450000</td>\n",
       "      <td>0.121136</td>\n",
       "      <td>0.044164</td>\n",
       "      <td>0.656782</td>\n",
       "      <td>0.750000</td>\n",
       "      <td>0.40</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5587</th>\n",
       "      <td>0.111111</td>\n",
       "      <td>0.077586</td>\n",
       "      <td>0.029310</td>\n",
       "      <td>0.603448</td>\n",
       "      <td>0.450000</td>\n",
       "      <td>0.081574</td>\n",
       "      <td>0.031945</td>\n",
       "      <td>0.641187</td>\n",
       "      <td>0.166667</td>\n",
       "      <td>0.55</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5588</th>\n",
       "      <td>0.900000</td>\n",
       "      <td>0.057026</td>\n",
       "      <td>0.024440</td>\n",
       "      <td>0.709437</td>\n",
       "      <td>0.550000</td>\n",
       "      <td>0.052300</td>\n",
       "      <td>0.047889</td>\n",
       "      <td>0.632640</td>\n",
       "      <td>0.900000</td>\n",
       "      <td>0.40</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5589</th>\n",
       "      <td>0.450000</td>\n",
       "      <td>0.067288</td>\n",
       "      <td>0.039973</td>\n",
       "      <td>0.661559</td>\n",
       "      <td>0.700000</td>\n",
       "      <td>0.088585</td>\n",
       "      <td>0.055291</td>\n",
       "      <td>0.639715</td>\n",
       "      <td>0.400000</td>\n",
       "      <td>0.65</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5590</th>\n",
       "      <td>0.850000</td>\n",
       "      <td>0.144371</td>\n",
       "      <td>0.043046</td>\n",
       "      <td>0.665563</td>\n",
       "      <td>0.450000</td>\n",
       "      <td>0.097120</td>\n",
       "      <td>0.056932</td>\n",
       "      <td>0.649699</td>\n",
       "      <td>0.750000</td>\n",
       "      <td>0.40</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5591</th>\n",
       "      <td>0.650000</td>\n",
       "      <td>0.232242</td>\n",
       "      <td>0.033972</td>\n",
       "      <td>0.708462</td>\n",
       "      <td>0.750000</td>\n",
       "      <td>0.108073</td>\n",
       "      <td>0.020182</td>\n",
       "      <td>0.679688</td>\n",
       "      <td>0.650000</td>\n",
       "      <td>0.75</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5592</th>\n",
       "      <td>0.700000</td>\n",
       "      <td>0.085645</td>\n",
       "      <td>0.056092</td>\n",
       "      <td>0.636912</td>\n",
       "      <td>0.500000</td>\n",
       "      <td>0.099791</td>\n",
       "      <td>0.055129</td>\n",
       "      <td>0.640614</td>\n",
       "      <td>0.700000</td>\n",
       "      <td>0.40</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5593</th>\n",
       "      <td>0.200000</td>\n",
       "      <td>0.080916</td>\n",
       "      <td>0.027481</td>\n",
       "      <td>0.609160</td>\n",
       "      <td>0.700000</td>\n",
       "      <td>0.225707</td>\n",
       "      <td>0.033825</td>\n",
       "      <td>0.716482</td>\n",
       "      <td>0.285714</td>\n",
       "      <td>0.70</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5594</th>\n",
       "      <td>0.700000</td>\n",
       "      <td>0.086562</td>\n",
       "      <td>0.054479</td>\n",
       "      <td>0.638015</td>\n",
       "      <td>0.181818</td>\n",
       "      <td>0.084306</td>\n",
       "      <td>0.025940</td>\n",
       "      <td>0.626459</td>\n",
       "      <td>0.700000</td>\n",
       "      <td>0.25</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5595</th>\n",
       "      <td>0.800000</td>\n",
       "      <td>0.104332</td>\n",
       "      <td>0.051861</td>\n",
       "      <td>0.643075</td>\n",
       "      <td>0.750000</td>\n",
       "      <td>0.089382</td>\n",
       "      <td>0.054589</td>\n",
       "      <td>0.646071</td>\n",
       "      <td>0.800000</td>\n",
       "      <td>0.75</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5596</th>\n",
       "      <td>0.850000</td>\n",
       "      <td>0.147901</td>\n",
       "      <td>0.041972</td>\n",
       "      <td>0.668221</td>\n",
       "      <td>0.900000</td>\n",
       "      <td>0.125860</td>\n",
       "      <td>0.027510</td>\n",
       "      <td>0.712517</td>\n",
       "      <td>0.750000</td>\n",
       "      <td>0.90</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5597</th>\n",
       "      <td>0.600000</td>\n",
       "      <td>0.077766</td>\n",
       "      <td>0.039144</td>\n",
       "      <td>0.653967</td>\n",
       "      <td>0.900000</td>\n",
       "      <td>0.088599</td>\n",
       "      <td>0.032967</td>\n",
       "      <td>0.670330</td>\n",
       "      <td>0.550000</td>\n",
       "      <td>0.90</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5598</th>\n",
       "      <td>0.700000</td>\n",
       "      <td>0.031008</td>\n",
       "      <td>0.019678</td>\n",
       "      <td>0.606440</td>\n",
       "      <td>0.850000</td>\n",
       "      <td>0.117147</td>\n",
       "      <td>0.050393</td>\n",
       "      <td>0.700262</td>\n",
       "      <td>0.700000</td>\n",
       "      <td>0.85</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5599</th>\n",
       "      <td>0.900000</td>\n",
       "      <td>0.129913</td>\n",
       "      <td>0.027315</td>\n",
       "      <td>0.712858</td>\n",
       "      <td>0.900000</td>\n",
       "      <td>0.082517</td>\n",
       "      <td>0.035664</td>\n",
       "      <td>0.669930</td>\n",
       "      <td>0.900000</td>\n",
       "      <td>0.90</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5600</th>\n",
       "      <td>0.850000</td>\n",
       "      <td>0.114381</td>\n",
       "      <td>0.051505</td>\n",
       "      <td>0.698997</td>\n",
       "      <td>0.750000</td>\n",
       "      <td>0.088411</td>\n",
       "      <td>0.054958</td>\n",
       "      <td>0.637993</td>\n",
       "      <td>0.850000</td>\n",
       "      <td>0.75</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5601</th>\n",
       "      <td>0.900000</td>\n",
       "      <td>0.085592</td>\n",
       "      <td>0.034237</td>\n",
       "      <td>0.673324</td>\n",
       "      <td>0.850000</td>\n",
       "      <td>0.108911</td>\n",
       "      <td>0.050825</td>\n",
       "      <td>0.695710</td>\n",
       "      <td>0.900000</td>\n",
       "      <td>0.85</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5602</th>\n",
       "      <td>0.850000</td>\n",
       "      <td>0.085417</td>\n",
       "      <td>0.035417</td>\n",
       "      <td>0.665278</td>\n",
       "      <td>0.600000</td>\n",
       "      <td>0.106011</td>\n",
       "      <td>0.051366</td>\n",
       "      <td>0.665027</td>\n",
       "      <td>0.850000</td>\n",
       "      <td>0.60</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5603</th>\n",
       "      <td>0.400000</td>\n",
       "      <td>0.042774</td>\n",
       "      <td>0.020739</td>\n",
       "      <td>0.604666</td>\n",
       "      <td>0.650000</td>\n",
       "      <td>0.131547</td>\n",
       "      <td>0.033496</td>\n",
       "      <td>0.690621</td>\n",
       "      <td>0.550000</td>\n",
       "      <td>0.65</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5604</th>\n",
       "      <td>0.850000</td>\n",
       "      <td>0.087695</td>\n",
       "      <td>0.033311</td>\n",
       "      <td>0.669613</td>\n",
       "      <td>0.650000</td>\n",
       "      <td>0.132064</td>\n",
       "      <td>0.033784</td>\n",
       "      <td>0.691032</td>\n",
       "      <td>0.850000</td>\n",
       "      <td>0.65</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5605</th>\n",
       "      <td>0.550000</td>\n",
       "      <td>0.104794</td>\n",
       "      <td>0.050725</td>\n",
       "      <td>0.655518</td>\n",
       "      <td>0.350000</td>\n",
       "      <td>0.041746</td>\n",
       "      <td>0.021505</td>\n",
       "      <td>0.597090</td>\n",
       "      <td>0.550000</td>\n",
       "      <td>0.55</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5606 rows × 10 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      p1_win_prob_20w  p1_ace_prob_20w  p1_df_prob_20w  p1_svptWon_prob_20w  \\\n",
       "0                 NaN              NaN             NaN                  NaN   \n",
       "1                 NaN              NaN             NaN                  NaN   \n",
       "2                 NaN              NaN             NaN                  NaN   \n",
       "3                 NaN              NaN             NaN                  NaN   \n",
       "4                 NaN              NaN             NaN                  NaN   \n",
       "5                 NaN              NaN             NaN                  NaN   \n",
       "6                 NaN              NaN             NaN                  NaN   \n",
       "7                 NaN              NaN             NaN                  NaN   \n",
       "8                 NaN              NaN             NaN                  NaN   \n",
       "9                 NaN              NaN             NaN                  NaN   \n",
       "10                NaN              NaN             NaN                  NaN   \n",
       "11                NaN              NaN             NaN                  NaN   \n",
       "12                NaN              NaN             NaN                  NaN   \n",
       "13                NaN              NaN             NaN                  NaN   \n",
       "14                NaN              NaN             NaN                  NaN   \n",
       "15                NaN              NaN             NaN                  NaN   \n",
       "16                NaN              NaN             NaN                  NaN   \n",
       "17                NaN              NaN             NaN                  NaN   \n",
       "18                NaN              NaN             NaN                  NaN   \n",
       "19                NaN              NaN             NaN                  NaN   \n",
       "20                NaN              NaN             NaN                  NaN   \n",
       "21                NaN              NaN             NaN                  NaN   \n",
       "22                NaN              NaN             NaN                  NaN   \n",
       "23                NaN              NaN             NaN                  NaN   \n",
       "24                NaN              NaN             NaN                  NaN   \n",
       "25                NaN              NaN             NaN                  NaN   \n",
       "26                NaN              NaN             NaN                  NaN   \n",
       "27                NaN              NaN             NaN                  NaN   \n",
       "28                NaN              NaN             NaN                  NaN   \n",
       "29                NaN              NaN             NaN                  NaN   \n",
       "...               ...              ...             ...                  ...   \n",
       "5576         0.500000         0.123810        0.041905             0.654603   \n",
       "5577         0.700000         0.032805        0.016402             0.610169   \n",
       "5578         0.125000         0.082852        0.030829             0.595376   \n",
       "5579         0.550000         0.061159        0.025751             0.634657   \n",
       "5580         0.350000         0.050960        0.050960             0.614163   \n",
       "5581         0.600000         0.106911        0.050756             0.667387   \n",
       "5582         0.450000         0.064821        0.041536             0.660793   \n",
       "5583         0.850000         0.151185        0.039718             0.678411   \n",
       "5584         0.450000         0.092008        0.056414             0.642713   \n",
       "5585         0.700000         0.238825        0.033206             0.712005   \n",
       "5586         0.750000         0.106184        0.021237             0.675203   \n",
       "5587         0.111111         0.077586        0.029310             0.603448   \n",
       "5588         0.900000         0.057026        0.024440             0.709437   \n",
       "5589         0.450000         0.067288        0.039973             0.661559   \n",
       "5590         0.850000         0.144371        0.043046             0.665563   \n",
       "5591         0.650000         0.232242        0.033972             0.708462   \n",
       "5592         0.700000         0.085645        0.056092             0.636912   \n",
       "5593         0.200000         0.080916        0.027481             0.609160   \n",
       "5594         0.700000         0.086562        0.054479             0.638015   \n",
       "5595         0.800000         0.104332        0.051861             0.643075   \n",
       "5596         0.850000         0.147901        0.041972             0.668221   \n",
       "5597         0.600000         0.077766        0.039144             0.653967   \n",
       "5598         0.700000         0.031008        0.019678             0.606440   \n",
       "5599         0.900000         0.129913        0.027315             0.712858   \n",
       "5600         0.850000         0.114381        0.051505             0.698997   \n",
       "5601         0.900000         0.085592        0.034237             0.673324   \n",
       "5602         0.850000         0.085417        0.035417             0.665278   \n",
       "5603         0.400000         0.042774        0.020739             0.604666   \n",
       "5604         0.850000         0.087695        0.033311             0.669613   \n",
       "5605         0.550000         0.104794        0.050725             0.655518   \n",
       "\n",
       "      p2_win_prob_20w  p2_ace_prob_20w  p2_df_prob_20w  p2_svptWon_prob_20w  \\\n",
       "0                 NaN              NaN             NaN                  NaN   \n",
       "1                 NaN              NaN             NaN                  NaN   \n",
       "2                 NaN              NaN             NaN                  NaN   \n",
       "3                 NaN              NaN             NaN                  NaN   \n",
       "4                 NaN              NaN             NaN                  NaN   \n",
       "5                 NaN              NaN             NaN                  NaN   \n",
       "6                 NaN              NaN             NaN                  NaN   \n",
       "7                 NaN              NaN             NaN                  NaN   \n",
       "8                 NaN              NaN             NaN                  NaN   \n",
       "9                 NaN              NaN             NaN                  NaN   \n",
       "10                NaN              NaN             NaN                  NaN   \n",
       "11                NaN              NaN             NaN                  NaN   \n",
       "12                NaN              NaN             NaN                  NaN   \n",
       "13                NaN              NaN             NaN                  NaN   \n",
       "14                NaN              NaN             NaN                  NaN   \n",
       "15                NaN              NaN             NaN                  NaN   \n",
       "16                NaN              NaN             NaN                  NaN   \n",
       "17                NaN              NaN             NaN                  NaN   \n",
       "18                NaN              NaN             NaN                  NaN   \n",
       "19                NaN              NaN             NaN                  NaN   \n",
       "20                NaN              NaN             NaN                  NaN   \n",
       "21                NaN              NaN             NaN                  NaN   \n",
       "22                NaN              NaN             NaN                  NaN   \n",
       "23                NaN              NaN             NaN                  NaN   \n",
       "24                NaN              NaN             NaN                  NaN   \n",
       "25                NaN              NaN             NaN                  NaN   \n",
       "26                NaN              NaN             NaN                  NaN   \n",
       "27                NaN              NaN             NaN                  NaN   \n",
       "28                NaN              NaN             NaN                  NaN   \n",
       "29                NaN              NaN             NaN                  NaN   \n",
       "...               ...              ...             ...                  ...   \n",
       "5576         0.800000         0.106942        0.052533             0.653533   \n",
       "5577         0.450000         0.082393        0.030474             0.645598   \n",
       "5578         0.750000         0.168580        0.045921             0.672508   \n",
       "5579         0.550000         0.052599        0.048215             0.631183   \n",
       "5580         0.900000         0.060876        0.024624             0.715458   \n",
       "5581         0.700000         0.089120        0.053241             0.646412   \n",
       "5582         0.600000         0.079812        0.039645             0.661450   \n",
       "5583         0.750000         0.038462        0.031805             0.661243   \n",
       "5584         0.900000         0.095176        0.029987             0.679270   \n",
       "5585         0.800000         0.116556        0.050331             0.691391   \n",
       "5586         0.450000         0.121136        0.044164             0.656782   \n",
       "5587         0.450000         0.081574        0.031945             0.641187   \n",
       "5588         0.550000         0.052300        0.047889             0.632640   \n",
       "5589         0.700000         0.088585        0.055291             0.639715   \n",
       "5590         0.450000         0.097120        0.056932             0.649699   \n",
       "5591         0.750000         0.108073        0.020182             0.679688   \n",
       "5592         0.500000         0.099791        0.055129             0.640614   \n",
       "5593         0.700000         0.225707        0.033825             0.716482   \n",
       "5594         0.181818         0.084306        0.025940             0.626459   \n",
       "5595         0.750000         0.089382        0.054589             0.646071   \n",
       "5596         0.900000         0.125860        0.027510             0.712517   \n",
       "5597         0.900000         0.088599        0.032967             0.670330   \n",
       "5598         0.850000         0.117147        0.050393             0.700262   \n",
       "5599         0.900000         0.082517        0.035664             0.669930   \n",
       "5600         0.750000         0.088411        0.054958             0.637993   \n",
       "5601         0.850000         0.108911        0.050825             0.695710   \n",
       "5602         0.600000         0.106011        0.051366             0.665027   \n",
       "5603         0.650000         0.131547        0.033496             0.690621   \n",
       "5604         0.650000         0.132064        0.033784             0.691032   \n",
       "5605         0.350000         0.041746        0.021505             0.597090   \n",
       "\n",
       "      p1_surface_win_prob_20w  p2_surface_win_prob_20w  \n",
       "0                         NaN                      NaN  \n",
       "1                         NaN                      NaN  \n",
       "2                         NaN                      NaN  \n",
       "3                         NaN                      NaN  \n",
       "4                         NaN                      NaN  \n",
       "5                         NaN                      NaN  \n",
       "6                         NaN                      NaN  \n",
       "7                         NaN                      NaN  \n",
       "8                         NaN                      NaN  \n",
       "9                         NaN                      NaN  \n",
       "10                        NaN                      NaN  \n",
       "11                        NaN                      NaN  \n",
       "12                        NaN                      NaN  \n",
       "13                        NaN                      NaN  \n",
       "14                        NaN                      NaN  \n",
       "15                        NaN                      NaN  \n",
       "16                        NaN                      NaN  \n",
       "17                        NaN                      NaN  \n",
       "18                        NaN                      NaN  \n",
       "19                        NaN                      NaN  \n",
       "20                        NaN                      NaN  \n",
       "21                        NaN                      NaN  \n",
       "22                        NaN                      NaN  \n",
       "23                        NaN                      NaN  \n",
       "24                        NaN                      NaN  \n",
       "25                        NaN                      NaN  \n",
       "26                        NaN                      NaN  \n",
       "27                        NaN                      NaN  \n",
       "28                        NaN                      NaN  \n",
       "29                        NaN                      NaN  \n",
       "...                       ...                      ...  \n",
       "5576                 0.450000                     0.80  \n",
       "5577                 0.700000                     0.55  \n",
       "5578                 0.200000                     0.75  \n",
       "5579                 0.550000                     0.40  \n",
       "5580                 0.300000                     0.90  \n",
       "5581                 0.600000                     0.65  \n",
       "5582                 0.450000                     0.55  \n",
       "5583                 0.750000                     0.75  \n",
       "5584                 0.450000                     0.85  \n",
       "5585                 0.700000                     0.80  \n",
       "5586                 0.750000                     0.40  \n",
       "5587                 0.166667                     0.55  \n",
       "5588                 0.900000                     0.40  \n",
       "5589                 0.400000                     0.65  \n",
       "5590                 0.750000                     0.40  \n",
       "5591                 0.650000                     0.75  \n",
       "5592                 0.700000                     0.40  \n",
       "5593                 0.285714                     0.70  \n",
       "5594                 0.700000                     0.25  \n",
       "5595                 0.800000                     0.75  \n",
       "5596                 0.750000                     0.90  \n",
       "5597                 0.550000                     0.90  \n",
       "5598                 0.700000                     0.85  \n",
       "5599                 0.900000                     0.90  \n",
       "5600                 0.850000                     0.75  \n",
       "5601                 0.900000                     0.85  \n",
       "5602                 0.850000                     0.60  \n",
       "5603                 0.550000                     0.65  \n",
       "5604                 0.850000                     0.65  \n",
       "5605                 0.550000                     0.55  \n",
       "\n",
       "[5606 rows x 10 columns]"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "window_sizes = [20] \n",
    "new_stats = []\n",
    "for idx, match in tqdm(new_data.iterrows()):\n",
    "    new_stats.append(get_players_stats(new_data, idx, window_sizes))\n",
    "pd.DataFrame(new_stats)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.concat([final_dataset, pd.DataFrame(new_stats), new_data[[\"p1_win\", \"diff_points\"]].astype(int)], axis=1).to_csv(\"atp_matches_with_stats_2016_17.csv\", index=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "## Columns description\n",
    "- tourney_id. A character id that uniquely identifies each tournament\n",
    "- tourney_name. A character tournament name\n",
    "- surface. A character description of the court surface (Carpet, Clay, Grass, or Hard)\n",
    "- draw_size. A numeric value indicating the draw size\n",
    "- tourney_level. A character description of the tournament level (A, C, D, F, G, M)\n",
    "- tourney_date. A numeric indicating the starting date of the tourney.\n",
    "- match_num. A numeric indicating the order of matches\n",
    "- p1_id. A numeric id identifying the player with higher ranking\n",
    "- p1_name. A character of the player with higher ranking's name\n",
    "- p1_hand. A character value indicated the handedness of the player with higher ranking\n",
    "- p1_ioc. A character of the player with higher ranking's country of origin\n",
    "- p1_age. A numeric of the player with higher ranking's age at the time of the match\n",
    "- p1_rank. A numeric of the player with higher ranking's rank at the time of the match\n",
    "- p1_rank_points. A numeric of the winner's 52-week ranking points at the time of the match\n",
    "- p2_id. A numeric id identifying the player with higher ranking\n",
    "- p2_name. A character of the player with lower ranking's name\n",
    "- p2_hand. A character value indicated the handedness of the player with higher ranking\n",
    "- p2_ioc. A character of the player with lower ranking's country of origin\n",
    "- p2_age. A numeric of the player with lower ranking's age at the time of the match\n",
    "- p2_rank. A numeric of the player with lower ranking's rank at the time of the match\n",
    "- p2_rank_points. A numeric of the lower's 52-week ranking points at the time of the match\n",
    "- p1_win_prob_20w: Percentage of matches won by the player with higher ranking in the last 20 matches\n",
    "- p1_ace_prob_20w: Percentage of aces by service done by the player with higher ranking in the last 20 matches\n",
    "- p1_df_prob_20w: Percentage of double faults by service done by the player with higher ranking in the last 20 matches\n",
    "- p1_svptWon_prob_20w: Percentage of services won by the player with higher ranking in the last 20 matches\n",
    "- p2_win_prob_20w: Percentage of matches won by the player with lower ranking in the last 20 matches\n",
    "- p2_ace_prob_20w: Percentage of aces by service done by the player with lower ranking in the last 20 matches\n",
    "- p2_df_prob_20w: Percentage of double faults by service done by the player with lower ranking in the last 20 matches\n",
    "- p2_svptWon_prob_20w: Percentage of services won by the player with lower ranking in the last 20 matches\n",
    "- p1_surface_win_prob_20w: Percentage of matches won by the player with higher ranking in the last 20 matches played in the same surface\n",
    "- p2_surface_win_prob_20w: Percentage of matches won by the player with higher ranking in the last 20 matches played in the same surface\n",
    "- p1_win: If the player with higher ranking won the match (1)\n",
    "- diff_points: Number of difference in services points won by each player"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}