Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save GuillaumeDesforges/da20d65b825a8e13da9cc1489eeee543 to your computer and use it in GitHub Desktop.
Save GuillaumeDesforges/da20d65b825a8e13da9cc1489eeee543 to your computer and use it in GitHub Desktop.
Pairwise ranking for beginners (CRASHES)
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Pairwise ranking for newbies\n",
"\n",
"Hi all!\n",
"\n",
"In this notebook, I will try to go through pairwise ranking in an understandable way even for Deep Learning beginners. Just the basics of Deep Learning and a bit of maths is required to understand this notebook."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Using TensorFlow backend.\n"
]
}
],
"source": [
"# list of librairies used in this notebook\n",
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"import keras"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 1. Preparing our data\n",
"\n",
"This notebook will use the Kaggle challenge [PUBG Finish Placement Prediction](https://www.kaggle.com/c/pubg-finish-placement-prediction/) as an example. This will demonstrate the power of such method.\n",
"\n",
"In this challenge we want to predict, given some statistics of each player of the game, the ranking of the teams. Team size might variate from 1 to 4. Let's assume we have our data available locally."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Know your data"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# might take a while\n",
"df_train_raw = pd.read_csv(\"./train.csv\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let us explore our data a bit to understand it.\n",
"\n",
"We first list the features at our disposal."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"text/plain": [
"Id int64\n",
"groupId int64\n",
"matchId int64\n",
"assists int64\n",
"boosts int64\n",
"damageDealt float64\n",
"DBNOs int64\n",
"headshotKills int64\n",
"heals int64\n",
"killPlace int64\n",
"killPoints int64\n",
"kills int64\n",
"killStreaks int64\n",
"longestKill float64\n",
"maxPlace int64\n",
"numGroups int64\n",
"revives int64\n",
"rideDistance float64\n",
"roadKills int64\n",
"swimDistance float64\n",
"teamKills int64\n",
"vehicleDestroys int64\n",
"walkDistance float64\n",
"weaponsAcquired int64\n",
"winPoints int64\n",
"winPlacePerc float64\n",
"dtype: object"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_train_raw.dtypes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We then take a look at the look of our data."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Id</th>\n",
" <th>groupId</th>\n",
" <th>matchId</th>\n",
" <th>assists</th>\n",
" <th>boosts</th>\n",
" <th>damageDealt</th>\n",
" <th>DBNOs</th>\n",
" <th>headshotKills</th>\n",
" <th>heals</th>\n",
" <th>killPlace</th>\n",
" <th>killPoints</th>\n",
" <th>kills</th>\n",
" <th>killStreaks</th>\n",
" <th>longestKill</th>\n",
" <th>maxPlace</th>\n",
" <th>numGroups</th>\n",
" <th>revives</th>\n",
" <th>rideDistance</th>\n",
" <th>roadKills</th>\n",
" <th>swimDistance</th>\n",
" <th>teamKills</th>\n",
" <th>vehicleDestroys</th>\n",
" <th>walkDistance</th>\n",
" <th>weaponsAcquired</th>\n",
" <th>winPoints</th>\n",
" <th>winPlacePerc</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0</td>\n",
" <td>24</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>5</td>\n",
" <td>247.30</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>4</td>\n",
" <td>17</td>\n",
" <td>1050</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>65.320</td>\n",
" <td>29</td>\n",
" <td>28</td>\n",
" <td>1</td>\n",
" <td>591.3</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>782.40</td>\n",
" <td>4</td>\n",
" <td>1458</td>\n",
" <td>0.8571</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>440875</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>37.65</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>45</td>\n",
" <td>1072</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>13.550</td>\n",
" <td>26</td>\n",
" <td>23</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>119.60</td>\n",
" <td>3</td>\n",
" <td>1511</td>\n",
" <td>0.0400</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2</td>\n",
" <td>878242</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>93.73</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>54</td>\n",
" <td>1404</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.000</td>\n",
" <td>28</td>\n",
" <td>28</td>\n",
" <td>1</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>3248.00</td>\n",
" <td>5</td>\n",
" <td>1583</td>\n",
" <td>0.7407</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>3</td>\n",
" <td>1319841</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>95.88</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>86</td>\n",
" <td>1069</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.000</td>\n",
" <td>97</td>\n",
" <td>94</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>21.49</td>\n",
" <td>1</td>\n",
" <td>1489</td>\n",
" <td>0.1146</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>4</td>\n",
" <td>1757883</td>\n",
" <td>4</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0.00</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>58</td>\n",
" <td>1034</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.000</td>\n",
" <td>47</td>\n",
" <td>41</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>640.80</td>\n",
" <td>4</td>\n",
" <td>1475</td>\n",
" <td>0.5217</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>5</td>\n",
" <td>2200824</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>128.10</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>25</td>\n",
" <td>1000</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>27.300</td>\n",
" <td>96</td>\n",
" <td>96</td>\n",
" <td>0</td>\n",
" <td>2221.0</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1016.00</td>\n",
" <td>4</td>\n",
" <td>1500</td>\n",
" <td>0.9368</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>6</td>\n",
" <td>2568717</td>\n",
" <td>6</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>130.30</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>28</td>\n",
" <td>1037</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>5.954</td>\n",
" <td>44</td>\n",
" <td>40</td>\n",
" <td>0</td>\n",
" <td>721.7</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>280.10</td>\n",
" <td>3</td>\n",
" <td>1495</td>\n",
" <td>0.3721</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>7</td>\n",
" <td>2612473</td>\n",
" <td>7</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>661.80</td>\n",
" <td>2</td>\n",
" <td>3</td>\n",
" <td>2</td>\n",
" <td>3</td>\n",
" <td>1148</td>\n",
" <td>5</td>\n",
" <td>2</td>\n",
" <td>36.640</td>\n",
" <td>46</td>\n",
" <td>46</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>2617.00</td>\n",
" <td>4</td>\n",
" <td>1479</td>\n",
" <td>1.0000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>8</td>\n",
" <td>2656377</td>\n",
" <td>8</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>94.72</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>5</td>\n",
" <td>50</td>\n",
" <td>1286</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.000</td>\n",
" <td>28</td>\n",
" <td>28</td>\n",
" <td>0</td>\n",
" <td>2963.0</td>\n",
" <td>0</td>\n",
" <td>28.9</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>3139.00</td>\n",
" <td>5</td>\n",
" <td>1528</td>\n",
" <td>0.7037</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>9</td>\n",
" <td>2700597</td>\n",
" <td>9</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>137.60</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>81</td>\n",
" <td>1000</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.000</td>\n",
" <td>25</td>\n",
" <td>23</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>238.70</td>\n",
" <td>3</td>\n",
" <td>1500</td>\n",
" <td>0.0417</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Id groupId matchId assists boosts damageDealt DBNOs headshotKills \\\n",
"0 0 24 0 0 5 247.30 2 0 \n",
"1 1 440875 1 1 0 37.65 1 1 \n",
"2 2 878242 2 0 1 93.73 1 0 \n",
"3 3 1319841 3 0 0 95.88 0 0 \n",
"4 4 1757883 4 0 1 0.00 0 0 \n",
"5 5 2200824 5 0 2 128.10 0 0 \n",
"6 6 2568717 6 1 0 130.30 0 0 \n",
"7 7 2612473 7 1 1 661.80 2 3 \n",
"8 8 2656377 8 0 3 94.72 0 0 \n",
"9 9 2700597 9 0 0 137.60 0 0 \n",
"\n",
" heals killPlace killPoints kills killStreaks longestKill maxPlace \\\n",
"0 4 17 1050 2 1 65.320 29 \n",
"1 0 45 1072 1 1 13.550 26 \n",
"2 2 54 1404 0 0 0.000 28 \n",
"3 0 86 1069 0 0 0.000 97 \n",
"4 1 58 1034 0 0 0.000 47 \n",
"5 0 25 1000 1 1 27.300 96 \n",
"6 0 28 1037 1 1 5.954 44 \n",
"7 2 3 1148 5 2 36.640 46 \n",
"8 5 50 1286 0 0 0.000 28 \n",
"9 0 81 1000 0 0 0.000 25 \n",
"\n",
" numGroups revives rideDistance roadKills swimDistance teamKills \\\n",
"0 28 1 591.3 0 0.0 0 \n",
"1 23 0 0.0 0 0.0 0 \n",
"2 28 1 0.0 0 0.0 0 \n",
"3 94 0 0.0 0 0.0 0 \n",
"4 41 0 0.0 0 0.0 0 \n",
"5 96 0 2221.0 0 0.0 0 \n",
"6 40 0 721.7 0 0.0 0 \n",
"7 46 0 0.0 0 0.0 0 \n",
"8 28 0 2963.0 0 28.9 0 \n",
"9 23 0 0.0 0 0.0 0 \n",
"\n",
" vehicleDestroys walkDistance weaponsAcquired winPoints winPlacePerc \n",
"0 0 782.40 4 1458 0.8571 \n",
"1 0 119.60 3 1511 0.0400 \n",
"2 0 3248.00 5 1583 0.7407 \n",
"3 0 21.49 1 1489 0.1146 \n",
"4 0 640.80 4 1475 0.5217 \n",
"5 0 1016.00 4 1500 0.9368 \n",
"6 0 280.10 3 1495 0.3721 \n",
"7 0 2617.00 4 1479 1.0000 \n",
"8 0 3139.00 5 1528 0.7037 \n",
"9 0 238.70 3 1500 0.0417 "
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# too many columns for the defaut print setting\n",
"pd.set_option('max_columns', len(df_train_raw.columns))\n",
"df_train_raw.head(n=10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The *winPlacePerc* column is the target to predict in this challenge. Players of the same team share the same rank at the end of a game:\n",
"\n",
"*Id* is the unique identifier of a player in a game, *groupId* is the identifier of a team of players, *matchId* is the identifier of a match.\n",
"\n",
"*numGroup* and *maxPlace* are informational columns which can be dropped."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>matchId</th>\n",
" <th>groupId</th>\n",
" <th>Id</th>\n",
" <th>winPlacePerc</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1518199</th>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>2168783</td>\n",
" <td>1.0000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1424005</th>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>2034198</td>\n",
" <td>1.0000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1471111</th>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>2101502</td>\n",
" <td>1.0000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3422448</th>\n",
" <td>0</td>\n",
" <td>11</td>\n",
" <td>4889112</td>\n",
" <td>0.9643</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3466655</th>\n",
" <td>0</td>\n",
" <td>11</td>\n",
" <td>4952215</td>\n",
" <td>0.9643</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" matchId groupId Id winPlacePerc\n",
"1518199 0 1 2168783 1.0000\n",
"1424005 0 1 2034198 1.0000\n",
"1471111 0 1 2101502 1.0000\n",
"3422448 0 11 4889112 0.9643\n",
"3466655 0 11 4952215 0.9643"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# define target, identifiers, features\n",
"target = 'winPlacePerc'\n",
"identifiers = ['matchId', 'groupId', 'Id']\n",
"# drop useless columns\n",
"df_train_raw.drop(inplace=True, columns=['numGroups', 'maxPlace'])\n",
"# visualize\n",
"df_train_raw[df_train_raw['matchId'] == 0][identifiers + [target]].sort_values(target, ascending=False).head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The rest are the features."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>assists</th>\n",
" <th>boosts</th>\n",
" <th>damageDealt</th>\n",
" <th>DBNOs</th>\n",
" <th>headshotKills</th>\n",
" <th>heals</th>\n",
" <th>killPlace</th>\n",
" <th>killPoints</th>\n",
" <th>kills</th>\n",
" <th>killStreaks</th>\n",
" <th>longestKill</th>\n",
" <th>revives</th>\n",
" <th>rideDistance</th>\n",
" <th>roadKills</th>\n",
" <th>swimDistance</th>\n",
" <th>teamKills</th>\n",
" <th>vehicleDestroys</th>\n",
" <th>walkDistance</th>\n",
" <th>weaponsAcquired</th>\n",
" <th>winPoints</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>435734.000000</td>\n",
" <td>435734.000000</td>\n",
" <td>435734.000000</td>\n",
" <td>435734.000000</td>\n",
" <td>435734.000000</td>\n",
" <td>435734.000000</td>\n",
" <td>435734.000000</td>\n",
" <td>435734.000000</td>\n",
" <td>435734.000000</td>\n",
" <td>435734.000000</td>\n",
" <td>435734.000000</td>\n",
" <td>435734.000000</td>\n",
" <td>435734.000000</td>\n",
" <td>435734.000000</td>\n",
" <td>435734.000000</td>\n",
" <td>435734.000000</td>\n",
" <td>435734.000000</td>\n",
" <td>435734.000000</td>\n",
" <td>435734.000000</td>\n",
" <td>435734.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>0.266254</td>\n",
" <td>0.963645</td>\n",
" <td>132.923307</td>\n",
" <td>0.690332</td>\n",
" <td>0.238026</td>\n",
" <td>1.190518</td>\n",
" <td>47.005113</td>\n",
" <td>1080.845029</td>\n",
" <td>0.933905</td>\n",
" <td>0.555109</td>\n",
" <td>19.864740</td>\n",
" <td>0.164935</td>\n",
" <td>423.523343</td>\n",
" <td>0.002479</td>\n",
" <td>4.162344</td>\n",
" <td>0.014123</td>\n",
" <td>0.005226</td>\n",
" <td>1054.653710</td>\n",
" <td>3.454376</td>\n",
" <td>1500.412623</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>0.634077</td>\n",
" <td>1.561460</td>\n",
" <td>169.747556</td>\n",
" <td>1.187364</td>\n",
" <td>0.609651</td>\n",
" <td>2.376457</td>\n",
" <td>27.289799</td>\n",
" <td>123.901803</td>\n",
" <td>1.564290</td>\n",
" <td>0.722168</td>\n",
" <td>45.908988</td>\n",
" <td>0.466346</td>\n",
" <td>1223.189786</td>\n",
" <td>0.055607</td>\n",
" <td>27.056278</td>\n",
" <td>0.134539</td>\n",
" <td>0.074971</td>\n",
" <td>1115.674853</td>\n",
" <td>2.402355</td>\n",
" <td>42.817219</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1.000000</td>\n",
" <td>161.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>350.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>23.000000</td>\n",
" <td>1000.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>133.600000</td>\n",
" <td>2.000000</td>\n",
" <td>1491.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>87.810000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>47.000000</td>\n",
" <td>1029.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>571.100000</td>\n",
" <td>3.000000</td>\n",
" <td>1500.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>0.000000</td>\n",
" <td>1.000000</td>\n",
" <td>188.500000</td>\n",
" <td>1.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1.000000</td>\n",
" <td>70.000000</td>\n",
" <td>1126.000000</td>\n",
" <td>1.000000</td>\n",
" <td>1.000000</td>\n",
" <td>16.230000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1808.000000</td>\n",
" <td>5.000000</td>\n",
" <td>1510.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>15.000000</td>\n",
" <td>18.000000</td>\n",
" <td>3479.000000</td>\n",
" <td>30.000000</td>\n",
" <td>23.000000</td>\n",
" <td>42.000000</td>\n",
" <td>100.000000</td>\n",
" <td>2029.000000</td>\n",
" <td>41.000000</td>\n",
" <td>10.000000</td>\n",
" <td>1030.000000</td>\n",
" <td>13.000000</td>\n",
" <td>24870.000000</td>\n",
" <td>5.000000</td>\n",
" <td>1959.000000</td>\n",
" <td>6.000000</td>\n",
" <td>3.000000</td>\n",
" <td>15440.000000</td>\n",
" <td>60.000000</td>\n",
" <td>1917.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" assists boosts damageDealt DBNOs \\\n",
"count 435734.000000 435734.000000 435734.000000 435734.000000 \n",
"mean 0.266254 0.963645 132.923307 0.690332 \n",
"std 0.634077 1.561460 169.747556 1.187364 \n",
"min 0.000000 0.000000 0.000000 0.000000 \n",
"25% 0.000000 0.000000 0.000000 0.000000 \n",
"50% 0.000000 0.000000 87.810000 0.000000 \n",
"75% 0.000000 1.000000 188.500000 1.000000 \n",
"max 15.000000 18.000000 3479.000000 30.000000 \n",
"\n",
" headshotKills heals killPlace killPoints \\\n",
"count 435734.000000 435734.000000 435734.000000 435734.000000 \n",
"mean 0.238026 1.190518 47.005113 1080.845029 \n",
"std 0.609651 2.376457 27.289799 123.901803 \n",
"min 0.000000 0.000000 1.000000 161.000000 \n",
"25% 0.000000 0.000000 23.000000 1000.000000 \n",
"50% 0.000000 0.000000 47.000000 1029.000000 \n",
"75% 0.000000 1.000000 70.000000 1126.000000 \n",
"max 23.000000 42.000000 100.000000 2029.000000 \n",
"\n",
" kills killStreaks longestKill revives \\\n",
"count 435734.000000 435734.000000 435734.000000 435734.000000 \n",
"mean 0.933905 0.555109 19.864740 0.164935 \n",
"std 1.564290 0.722168 45.908988 0.466346 \n",
"min 0.000000 0.000000 0.000000 0.000000 \n",
"25% 0.000000 0.000000 0.000000 0.000000 \n",
"50% 0.000000 0.000000 0.000000 0.000000 \n",
"75% 1.000000 1.000000 16.230000 0.000000 \n",
"max 41.000000 10.000000 1030.000000 13.000000 \n",
"\n",
" rideDistance roadKills swimDistance teamKills \\\n",
"count 435734.000000 435734.000000 435734.000000 435734.000000 \n",
"mean 423.523343 0.002479 4.162344 0.014123 \n",
"std 1223.189786 0.055607 27.056278 0.134539 \n",
"min 0.000000 0.000000 0.000000 0.000000 \n",
"25% 0.000000 0.000000 0.000000 0.000000 \n",
"50% 0.000000 0.000000 0.000000 0.000000 \n",
"75% 0.000000 0.000000 0.000000 0.000000 \n",
"max 24870.000000 5.000000 1959.000000 6.000000 \n",
"\n",
" vehicleDestroys walkDistance weaponsAcquired winPoints \n",
"count 435734.000000 435734.000000 435734.000000 435734.000000 \n",
"mean 0.005226 1054.653710 3.454376 1500.412623 \n",
"std 0.074971 1115.674853 2.402355 42.817219 \n",
"min 0.000000 0.000000 0.000000 350.000000 \n",
"25% 0.000000 133.600000 2.000000 1491.000000 \n",
"50% 0.000000 571.100000 3.000000 1500.000000 \n",
"75% 0.000000 1808.000000 5.000000 1510.000000 \n",
"max 3.000000 15440.000000 60.000000 1917.000000 "
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# define features\n",
"features = [column for column in df_train_raw.columns if column != target and column not in identifiers]\n",
"# visualize\n",
"df_train_raw[features].sample(frac=0.1).describe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The first step will be to normalize our data properly."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"class DataFrameNormalizer:\n",
" def __init__(self, features):\n",
" self.features = features\n",
" self.mean = None\n",
" self.std = None\n",
" def fit(self, df):\n",
" self.mean = df[self.features].mean()\n",
" self.std = df[self.features].std()\n",
" def transform(self, df):\n",
" return ((df[self.features] - self.mean)/self.std).merge(right=df.drop(columns=self.features), left_index=True, right_index=True)\n",
" def fit_transform(self, df):\n",
" self.fit(df)\n",
" return self.transform(df)\n",
"\n",
"data_frame_normalizer = DataFrameNormalizer(features)\n",
"df_train_normed = data_frame_normalizer.fit_transform(df_train_raw)\n",
"del df_train_raw"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>assists</th>\n",
" <th>boosts</th>\n",
" <th>damageDealt</th>\n",
" <th>DBNOs</th>\n",
" <th>headshotKills</th>\n",
" <th>heals</th>\n",
" <th>killPlace</th>\n",
" <th>killPoints</th>\n",
" <th>kills</th>\n",
" <th>killStreaks</th>\n",
" <th>longestKill</th>\n",
" <th>revives</th>\n",
" <th>rideDistance</th>\n",
" <th>roadKills</th>\n",
" <th>swimDistance</th>\n",
" <th>teamKills</th>\n",
" <th>vehicleDestroys</th>\n",
" <th>walkDistance</th>\n",
" <th>weaponsAcquired</th>\n",
" <th>winPoints</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>435734.000000</td>\n",
" <td>435734.000000</td>\n",
" <td>435734.000000</td>\n",
" <td>435734.000000</td>\n",
" <td>435734.000000</td>\n",
" <td>435734.000000</td>\n",
" <td>435734.000000</td>\n",
" <td>435734.000000</td>\n",
" <td>435734.000000</td>\n",
" <td>435734.000000</td>\n",
" <td>435734.000000</td>\n",
" <td>435734.000000</td>\n",
" <td>435734.000000</td>\n",
" <td>435734.000000</td>\n",
" <td>435734.000000</td>\n",
" <td>435734.000000</td>\n",
" <td>435734.000000</td>\n",
" <td>435734.000000</td>\n",
" <td>435734.000000</td>\n",
" <td>435734.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>0.000131</td>\n",
" <td>-0.001371</td>\n",
" <td>-0.000628</td>\n",
" <td>-0.001038</td>\n",
" <td>-0.002114</td>\n",
" <td>0.000450</td>\n",
" <td>0.001927</td>\n",
" <td>0.000754</td>\n",
" <td>-0.000779</td>\n",
" <td>-0.001344</td>\n",
" <td>-0.001027</td>\n",
" <td>-0.001157</td>\n",
" <td>0.001366</td>\n",
" <td>0.001895</td>\n",
" <td>0.000588</td>\n",
" <td>0.000787</td>\n",
" <td>0.001286</td>\n",
" <td>-0.002421</td>\n",
" <td>0.000943</td>\n",
" <td>0.000753</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>1.000215</td>\n",
" <td>0.999155</td>\n",
" <td>1.002474</td>\n",
" <td>1.003177</td>\n",
" <td>0.997219</td>\n",
" <td>1.003085</td>\n",
" <td>0.999808</td>\n",
" <td>1.000904</td>\n",
" <td>1.002760</td>\n",
" <td>1.000045</td>\n",
" <td>0.997591</td>\n",
" <td>0.995901</td>\n",
" <td>1.003048</td>\n",
" <td>0.956531</td>\n",
" <td>0.982742</td>\n",
" <td>0.999710</td>\n",
" <td>1.011088</td>\n",
" <td>0.997847</td>\n",
" <td>1.006543</td>\n",
" <td>0.990469</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>-0.418835</td>\n",
" <td>-0.617493</td>\n",
" <td>-0.782042</td>\n",
" <td>-0.579217</td>\n",
" <td>-0.390931</td>\n",
" <td>-0.501680</td>\n",
" <td>-1.684531</td>\n",
" <td>-7.314555</td>\n",
" <td>-0.596667</td>\n",
" <td>-0.768006</td>\n",
" <td>-0.433425</td>\n",
" <td>-0.353027</td>\n",
" <td>-0.346614</td>\n",
" <td>-0.040304</td>\n",
" <td>-0.150027</td>\n",
" <td>-0.104460</td>\n",
" <td>-0.069307</td>\n",
" <td>-0.945346</td>\n",
" <td>-1.439272</td>\n",
" <td>-27.024448</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>-0.418835</td>\n",
" <td>-0.617493</td>\n",
" <td>-0.782042</td>\n",
" <td>-0.579217</td>\n",
" <td>-0.390931</td>\n",
" <td>-0.501680</td>\n",
" <td>-0.879488</td>\n",
" <td>-0.653919</td>\n",
" <td>-0.596667</td>\n",
" <td>-0.768006</td>\n",
" <td>-0.433425</td>\n",
" <td>-0.353027</td>\n",
" <td>-0.346614</td>\n",
" <td>-0.040304</td>\n",
" <td>-0.150027</td>\n",
" <td>-0.104460</td>\n",
" <td>-0.069307</td>\n",
" <td>-0.825646</td>\n",
" <td>-0.606671</td>\n",
" <td>-0.223436</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>-0.418835</td>\n",
" <td>-0.617493</td>\n",
" <td>-0.267755</td>\n",
" <td>-0.579217</td>\n",
" <td>-0.390931</td>\n",
" <td>-0.501680</td>\n",
" <td>-0.001259</td>\n",
" <td>-0.419504</td>\n",
" <td>-0.596667</td>\n",
" <td>-0.768006</td>\n",
" <td>-0.433425</td>\n",
" <td>-0.353027</td>\n",
" <td>-0.346614</td>\n",
" <td>-0.040304</td>\n",
" <td>-0.150027</td>\n",
" <td>-0.104460</td>\n",
" <td>-0.069307</td>\n",
" <td>-0.433933</td>\n",
" <td>-0.190370</td>\n",
" <td>-0.011849</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>-0.418835</td>\n",
" <td>0.023269</td>\n",
" <td>0.323617</td>\n",
" <td>0.260051</td>\n",
" <td>-0.390931</td>\n",
" <td>-0.079095</td>\n",
" <td>0.840377</td>\n",
" <td>0.364576</td>\n",
" <td>0.041824</td>\n",
" <td>0.616374</td>\n",
" <td>-0.082180</td>\n",
" <td>-0.353027</td>\n",
" <td>-0.346614</td>\n",
" <td>-0.040304</td>\n",
" <td>-0.150027</td>\n",
" <td>-0.104460</td>\n",
" <td>-0.069307</td>\n",
" <td>0.669172</td>\n",
" <td>0.642232</td>\n",
" <td>0.223248</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>24.809162</td>\n",
" <td>10.916217</td>\n",
" <td>22.943435</td>\n",
" <td>26.277366</td>\n",
" <td>38.933781</td>\n",
" <td>22.740489</td>\n",
" <td>1.938164</td>\n",
" <td>7.429376</td>\n",
" <td>25.581471</td>\n",
" <td>14.460171</td>\n",
" <td>20.780770</td>\n",
" <td>29.612699</td>\n",
" <td>23.612306</td>\n",
" <td>157.522425</td>\n",
" <td>70.905091</td>\n",
" <td>45.033226</td>\n",
" <td>40.332766</td>\n",
" <td>9.394021</td>\n",
" <td>26.452882</td>\n",
" <td>9.486054</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" assists boosts damageDealt DBNOs \\\n",
"count 435734.000000 435734.000000 435734.000000 435734.000000 \n",
"mean 0.000131 -0.001371 -0.000628 -0.001038 \n",
"std 1.000215 0.999155 1.002474 1.003177 \n",
"min -0.418835 -0.617493 -0.782042 -0.579217 \n",
"25% -0.418835 -0.617493 -0.782042 -0.579217 \n",
"50% -0.418835 -0.617493 -0.267755 -0.579217 \n",
"75% -0.418835 0.023269 0.323617 0.260051 \n",
"max 24.809162 10.916217 22.943435 26.277366 \n",
"\n",
" headshotKills heals killPlace killPoints \\\n",
"count 435734.000000 435734.000000 435734.000000 435734.000000 \n",
"mean -0.002114 0.000450 0.001927 0.000754 \n",
"std 0.997219 1.003085 0.999808 1.000904 \n",
"min -0.390931 -0.501680 -1.684531 -7.314555 \n",
"25% -0.390931 -0.501680 -0.879488 -0.653919 \n",
"50% -0.390931 -0.501680 -0.001259 -0.419504 \n",
"75% -0.390931 -0.079095 0.840377 0.364576 \n",
"max 38.933781 22.740489 1.938164 7.429376 \n",
"\n",
" kills killStreaks longestKill revives \\\n",
"count 435734.000000 435734.000000 435734.000000 435734.000000 \n",
"mean -0.000779 -0.001344 -0.001027 -0.001157 \n",
"std 1.002760 1.000045 0.997591 0.995901 \n",
"min -0.596667 -0.768006 -0.433425 -0.353027 \n",
"25% -0.596667 -0.768006 -0.433425 -0.353027 \n",
"50% -0.596667 -0.768006 -0.433425 -0.353027 \n",
"75% 0.041824 0.616374 -0.082180 -0.353027 \n",
"max 25.581471 14.460171 20.780770 29.612699 \n",
"\n",
" rideDistance roadKills swimDistance teamKills \\\n",
"count 435734.000000 435734.000000 435734.000000 435734.000000 \n",
"mean 0.001366 0.001895 0.000588 0.000787 \n",
"std 1.003048 0.956531 0.982742 0.999710 \n",
"min -0.346614 -0.040304 -0.150027 -0.104460 \n",
"25% -0.346614 -0.040304 -0.150027 -0.104460 \n",
"50% -0.346614 -0.040304 -0.150027 -0.104460 \n",
"75% -0.346614 -0.040304 -0.150027 -0.104460 \n",
"max 23.612306 157.522425 70.905091 45.033226 \n",
"\n",
" vehicleDestroys walkDistance weaponsAcquired winPoints \n",
"count 435734.000000 435734.000000 435734.000000 435734.000000 \n",
"mean 0.001286 -0.002421 0.000943 0.000753 \n",
"std 1.011088 0.997847 1.006543 0.990469 \n",
"min -0.069307 -0.945346 -1.439272 -27.024448 \n",
"25% -0.069307 -0.825646 -0.606671 -0.223436 \n",
"50% -0.069307 -0.433933 -0.190370 -0.011849 \n",
"75% -0.069307 0.669172 0.642232 0.223248 \n",
"max 40.332766 9.394021 26.452882 9.486054 "
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_train_normed[features].sample(frac=0.1).describe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can now visualize it just enough to understand what is going on."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"#df_train_normed_viz = df_train_normed[df_train_normed['matchId'] == 1]\n",
"#sns.pairplot(data=df_train_normed_viz[[column for column in df_train_normed_viz.columns if column not in identifiers]])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can already see a bit what is happening. For instance *roadKills* carries a small signal, while *walkDistance* clearly is an indicator that will weight in the ranking (the sooner you die, the less likely you walk...)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now is the time to get our hands dirty. We want to predict the ranking of players, in the form of a float from 0 to 1. This ranking will be predicted for each game. To facilitate our work, we will aggregate our data by team, computing the means and standart deviations of its players."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYAAAAD8CAYAAAB+UHOxAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvIxREBQAAEq9JREFUeJzt3X+s3fV93/Hnq5CkbbrWJhjEbGuXqFYbWikBXYG3TFUHHRioav4IE1W1eMyS/6FbOlXqzDYJNT8mIk0lidQiWeDWRFkIo+mwAgqzHKJqf0C4BEYAh9klDN/ZxbezoT+iJiV974/zcXIw9/qea997j30+z4d0dL7f9/dzzv18/LW/r/v9nO/5OlWFJKk/PzbuDkiSxsMAkKROGQCS1CkDQJI6ZQBIUqcMAEnqlAEgSZ0yACSpUwaAJHXqwnF34HQuvvjimpqaGnc3JOm88swzz/xFVa1brN05HQBTU1PMzMyMuxuSdF5J8n9GaecUkCR1ygCQpE4ZAJLUKQNAkjplAEhSpwwASeqUASBJnTIAJKlTBoAkdeqc/iawJE2iqZ2P/nD51btvHls/RgqAJGuA+4BfBAr418DLwJeAKeBV4F9U1YkkAT4L3AR8F/hXVfXN9j7bgP/U3vaTVbVn2UYiSeew4YP+uWLUKaDPAl+tqp8HPggcAHYC+6tqE7C/rQPcCGxqjx3AvQBJLgLuAq4BrgbuSrJ2mcYhSVqiRQMgyU8DvwTcD1BV36+qN4CtwMnf4PcAt7TlrcADNfAksCbJZcANwL6qOl5VJ4B9wJZlHY0kaWSjnAG8H5gD/jDJs0nuS/Je4NKqOgrQni9p7dcDh4deP9tqC9UlSWMwSgBcCFwF3FtVVwJ/w4+me+aTeWp1mvrbX5zsSDKTZGZubm6E7kmSzsQoATALzFbVU239YQaB8Hqb2qE9Hxtqv3Ho9RuAI6epv01V7aqq6aqaXrdu0f/PQJJ0hha9Cqiq/jzJ4SQ/V1UvA9cBL7XHNuDu9vxIe8le4DeTPMjgA983q+pokseB/zz0we/1wJ3LOxxJOneci1f+DBv1ewD/BvhCkncDrwC3Mzh7eCjJduA14NbW9jEGl4AeYnAZ6O0AVXU8ySeAp1u7j1fV8WUZhSRpyUYKgKp6DpieZ9N187Qt4I4F3mc3sHspHZQkrQxvBSFJnTIAJKlTBoAkdcoAkKROGQCS1CkDQJI6ZQBIUqcMAEnqlAEgSZ0yACSpUwaAJHXKAJCkThkAktQpA0CSOmUASFKnDABJ6pQBIEmdMgAkqVMGgCR1ygCQpE4ZAJLUKQNAkjplAEhSpwwASeqUASBJnRopAJK8muRbSZ5LMtNqFyXZl+Rge17b6knyuSSHkjyf5Kqh99nW2h9Msm1lhiRJGsVSzgD+WVV9qKqm2/pOYH9VbQL2t3WAG4FN7bEDuBcGgQHcBVwDXA3cdTI0JEmr72ymgLYCe9ryHuCWofoDNfAksCbJZcANwL6qOl5VJ4B9wJaz+PmSpLMwagAU8D+SPJNkR6tdWlVHAdrzJa2+Hjg89NrZVluo/jZJdiSZSTIzNzc3+kgkSUty4YjtPlxVR5JcAuxL8u3TtM08tTpN/e2Fql3ALoDp6el3bJckLY+RzgCq6kh7Pgb8CYM5/Nfb1A7t+VhrPgtsHHr5BuDIaeqSpDFYNACSvDfJPzi5DFwPvADsBU5eybMNeKQt7wU+2q4G2gy82aaIHgeuT7K2ffh7fatJksZglCmgS4E/SXKy/X+tqq8meRp4KMl24DXg1tb+MeAm4BDwXeB2gKo6nuQTwNOt3cer6viyjUSStCSLBkBVvQJ8cJ76/wOum6dewB0LvNduYPfSuylJWm5+E1iSOmUASFKnDABJ6pQBIEmdMgAkqVMGgCR1ygCQpE6Nei8gSdIIpnY+Ou4ujMwzAEnqlAEgSZ0yACSpUwaAJHXKAJCkThkAktQpA0CSOmUASFKnDABJ6pQBIEmdMgAkqVMGgCR1ygCQpE4ZAJLUKQNAkjplAEhSp0YOgCQXJHk2yVfa+uVJnkpyMMmXkry71d/T1g+17VND73Fnq7+c5IblHowkaXRLOQP4GHBgaP3TwD1VtQk4AWxv9e3Aiar6WeCe1o4kVwC3Ab8AbAH+IMkFZ9d9SdKZGikAkmwAbgbua+sBrgUebk32ALe05a1tnbb9utZ+K/BgVX2vqr4DHAKuXo5BSJKWbtQzgM8AvwP8fVt/H/BGVb3V1meB9W15PXAYoG1/s7X/YX2e10iSVtmiAZDkV4FjVfXMcHmeprXIttO9Zvjn7Ugyk2Rmbm5use5Jks7QKGcAHwZ+LcmrwIMMpn4+A6xJcmFrswE40pZngY0AbfvPAMeH6/O85oeqaldVTVfV9Lp165Y8IEnSaBYNgKq6s6o2VNUUgw9xv1ZVvwE8AXykNdsGPNKW97Z12vavVVW1+m3tKqHLgU3AN5ZtJJKkJblw8SYL+vfAg0k+CTwL3N/q9wOfT3KIwW/+twFU1YtJHgJeAt4C7qiqH5zFz5cknYUlBUBVfR34elt+hXmu4qmqvwVuXeD1nwI+tdROSpKWn98ElqROGQCS1CkDQJI6ZQBIUqcMAEnqlAEgSZ06m+8BSJKAqZ2PjrsLZ8QzAEnqlAEgSZ0yACSpUwaAJHXKAJCkThkAktQpA0CSOmUASFKn/CKYJI3R8JfIXr375lX92Z4BSFKnDABJ6pQBIEmdMgAkqVMGgCR1ygCQpE4ZAJLUKQNAkjplAEhSpxYNgCQ/nuQbSf5XkheT/G6rX57kqSQHk3wpybtb/T1t/VDbPjX0Xne2+stJblipQUmSFjfKGcD3gGur6oPAh4AtSTYDnwbuqapNwAlge2u/HThRVT8L3NPakeQK4DbgF4AtwB8kuWA5ByNJGt2iAVADf91W39UeBVwLPNzqe4Bb2vLWtk7bfl2StPqDVfW9qvoOcAi4ellGIUlaspE+A0hyQZLngGPAPuDPgDeq6q3WZBZY35bXA4cB2vY3gfcN1+d5jSRplY0UAFX1g6r6ELCBwW/tH5ivWXvOAtsWqr9Nkh1JZpLMzM3NjdI9SdIZWNJVQFX1BvB1YDOwJsnJ20lvAI605VlgI0Db/jPA8eH6PK8Z/hm7qmq6qqbXrVu3lO5JkpZglKuA1iVZ05Z/AvgV4ADwBPCR1mwb8Ehb3tvWadu/VlXV6re1q4QuBzYB31iugUiSlmaU/xDmMmBPu2Lnx4CHquorSV4CHkzySeBZ4P7W/n7g80kOMfjN/zaAqnoxyUPAS8BbwB1V9YPlHY4kaVSLBkBVPQ9cOU/9Fea5iqeq/ha4dYH3+hTwqaV3U5K03PwmsCR1ygCQpE4ZAJLUKQNAkjplAEhSp0a5DFSSdIqpnY+OuwtnzTMASeqUASBJnTIAJKlTBoAkdcoAkKROGQCS1CkDQJI6ZQBIUqcMAEnqlAEgSZ0yACSpUwaAJHXKAJCkThkAktQpA0CSOmUASFKnDABJ6pQBIEmdMgAkqVOLBkCSjUmeSHIgyYtJPtbqFyXZl+Rge17b6knyuSSHkjyf5Kqh99rW2h9Msm3lhiVJWswoZwBvAb9dVR8ANgN3JLkC2Ansr6pNwP62DnAjsKk9dgD3wiAwgLuAa4CrgbtOhoYkafUtGgBVdbSqvtmW/wo4AKwHtgJ7WrM9wC1teSvwQA08CaxJchlwA7Cvqo5X1QlgH7BlWUcjSRrZkj4DSDIFXAk8BVxaVUdhEBLAJa3ZeuDw0MtmW22h+qk/Y0eSmSQzc3NzS+meJGkJLhy1YZKfAv4Y+K2q+sskCzadp1anqb+9ULUL2AUwPT39ju2SNC5TOx8ddxeW1UhnAEnexeDg/4Wq+nIrv96mdmjPx1p9Ftg49PINwJHT1CVJYzDKVUAB7gcOVNXvDW3aC5y8kmcb8MhQ/aPtaqDNwJttiuhx4Poka9uHv9e3miRpDEaZAvow8C+BbyV5rtX+A3A38FCS7cBrwK1t22PATcAh4LvA7QBVdTzJJ4CnW7uPV9XxZRmFJGnJFg2AqvqfzD9/D3DdPO0LuGOB99oN7F5KByVJK8NvAktSpwwASeqUASBJnTIAJKlTBoAkdcoAkKROGQCS1KmR7wUkSVpZw/caevXum1f853kGIEmdMgAkqVMGgCR1ygCQpE4ZAJLUKQNAkjrlZaCStIBJ+y8gT+UZgCR1ygCQpE4ZAJLUKQNAkjplAEhSpwwASeqUASBJnTIAJKlTfhFMkoZM+pe/hi16BpBkd5JjSV4Yql2UZF+Sg+15basnyeeSHEryfJKrhl6zrbU/mGTbygxHkjSqUaaA/gjYckptJ7C/qjYB+9s6wI3ApvbYAdwLg8AA7gKuAa4G7joZGpKk8Vg0AKrqT4Hjp5S3Anva8h7glqH6AzXwJLAmyWXADcC+qjpeVSeAfbwzVCRJq+hMPwS+tKqOArTnS1p9PXB4qN1sqy1UlySNyXJfBZR5anWa+jvfINmRZCbJzNzc3LJ2TpL0I2caAK+3qR3a87FWnwU2DrXbABw5Tf0dqmpXVU1X1fS6devOsHuSpMWcaQDsBU5eybMNeGSo/tF2NdBm4M02RfQ4cH2Ste3D3+tbTZI0Jot+DyDJF4FfBi5OMsvgap67gYeSbAdeA25tzR8DbgIOAd8FbgeoquNJPgE83dp9vKpO/WBZkrSKFg2Aqvr1BTZdN0/bAu5Y4H12A7uX1DtJ0orxVhCS1CkDQJI65b2AJHWvp/v/DPMMQJI6ZQBIUqcMAEnqlAEgSZ0yACSpU14FJKlLvV75M8wzAEnqlAEgSZ0yACSpUwaAJHXKD4ElnbVRPlB99e6bV6Enp+cHv2/nGYAkdcozAEmrYvi373PhbECeAUhStzwDkHRGzmY+fTXPBpz3X5gBIGlkHkwniwEgaaz8bGB8DABJp+Vv/ZPLAJD0DufjQf987PO4eRWQJHXKMwBJ58xvz34esLoMAKlT58pBfyELhcG53u/zyaoHQJItwGeBC4D7quru1e6DpPOLB/2VsaoBkOQC4PeBfw7MAk8n2VtVL61mP3T+cWrgzHnw1EJW+wzgauBQVb0CkORBYCtw3gfAUu+GOMrpbY8HulH+HHv/M1qIB3ot1WoHwHrg8ND6LHDNKvdh2Sz1H9xC7Zda14/4ZySdudUOgMxTq7c1SHYAO9rqXyd5eQnvfzHwF2fYt/NZj+PucczQ57h7HDP59FmN+x+N0mi1A2AW2Di0vgE4MtygqnYBu87kzZPMVNX0mXfv/NTjuHscM/Q57h7HDKsz7tX+ItjTwKYklyd5N3AbsHeV+yBJYpXPAKrqrSS/CTzO4DLQ3VX14mr2QZI0sOrfA6iqx4DHVujtz2jqaAL0OO4exwx9jrvHMcMqjDtVtXgrSdLE8WZwktSpiQmAJFuSvJzkUJKd4+7PSkiyMckTSQ4keTHJx1r9oiT7khxsz2vH3deVkOSCJM8m+UpbvzzJU23cX2oXFkyMJGuSPJzk222f/+Me9nWSf9f+fr+Q5ItJfnwS93WS3UmOJXlhqDbv/s3A59rx7fkkVy1HHyYiAIZuMXEjcAXw60muGG+vVsRbwG9X1QeAzcAdbZw7gf1VtQnY39Yn0ceAA0PrnwbuaeM+AWwfS69WzmeBr1bVzwMfZDD2id7XSdYD/xaYrqpfZHCxyG1M5r7+I2DLKbWF9u+NwKb22AHcuxwdmIgAYOgWE1X1feDkLSYmSlUdrapvtuW/YnBAWM9grHtasz3ALePp4cpJsgG4GbivrQe4Fni4NZmocSf5aeCXgPsBqur7VfUGHexrBhen/ESSC4GfBI4ygfu6qv4UOH5KeaH9uxV4oAaeBNYkuexs+zApATDfLSbWj6kvqyLJFHAl8BRwaVUdhUFIAJeMr2cr5jPA7wB/39bfB7xRVW+19Unb5+8H5oA/bNNe9yV5LxO+r6vq/wL/BXiNwYH/TeAZJntfD1to/67IMW5SAmDRW0xMkiQ/Bfwx8FtV9Zfj7s9KS/KrwLGqema4PE/TSdrnFwJXAfdW1ZXA3zBh0z3zaXPeW4HLgX8IvJfB9MepJmlfj2JF/r5PSgAseouJSZHkXQwO/l+oqi+38usnTwfb87Fx9W+FfBj4tSSvMpjeu5bBGcGaNk0Ak7fPZ4HZqnqqrT/MIBAmfV//CvCdqpqrqr8Dvgz8EyZ7Xw9baP+uyDFuUgKgi1tMtHnv+4EDVfV7Q5v2Atva8jbgkdXu20qqqjurakNVTTHYt1+rqt8AngA+0ppN1Lir6s+Bw0l+rpWuY3Db9Ine1wymfjYn+cn29/3kuCd2X59iof27F/houxpoM/Dmyamis1JVE/EAbgL+N/BnwH8cd39WaIz/lMFp3/PAc+1xE4P58P3AwfZ80bj7uoJ/Br8MfKUtvx/4BnAI+G/Ae8bdv2Ue64eAmba//zuwtod9Dfwu8G3gBeDzwHsmcV8DX2TwOcffMfgNf/tC+5fBFNDvt+PbtxhcJXXWffCbwJLUqUmZApIkLZEBIEmdMgAkqVMGgCR1ygCQpE4ZAJLUKQNAkjplAEhSp/4/jCs3QIi+jfMAAAAASUVORK5CYII=\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.hist(df_train_normed.groupby(by='matchId').size().values, bins=100)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see below that most games are solo games, so it will not have much influence.\n",
"\n",
"Without further ado, we proceed to our last data preparation step."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>mean_assists</th>\n",
" <th>mean_boosts</th>\n",
" <th>mean_damageDealt</th>\n",
" <th>mean_DBNOs</th>\n",
" <th>mean_headshotKills</th>\n",
" <th>mean_heals</th>\n",
" <th>mean_killPlace</th>\n",
" <th>mean_killPoints</th>\n",
" <th>mean_kills</th>\n",
" <th>mean_killStreaks</th>\n",
" <th>mean_longestKill</th>\n",
" <th>mean_revives</th>\n",
" <th>mean_rideDistance</th>\n",
" <th>...</th>\n",
" <th>var_longestKill</th>\n",
" <th>var_revives</th>\n",
" <th>var_rideDistance</th>\n",
" <th>var_roadKills</th>\n",
" <th>var_swimDistance</th>\n",
" <th>var_teamKills</th>\n",
" <th>var_vehicleDestroys</th>\n",
" <th>var_walkDistance</th>\n",
" <th>var_weaponsAcquired</th>\n",
" <th>var_winPoints</th>\n",
" <th>matchId</th>\n",
" <th>groupId</th>\n",
" <th>winPlacePerc</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0.106748</td>\n",
" <td>0.450443</td>\n",
" <td>1.520679</td>\n",
" <td>1.379075</td>\n",
" <td>0.155245</td>\n",
" <td>0.202628</td>\n",
" <td>-1.038057</td>\n",
" <td>1.124406</td>\n",
" <td>2.170128</td>\n",
" <td>1.539294</td>\n",
" <td>0.728741</td>\n",
" <td>0.360442</td>\n",
" <td>-0.346614</td>\n",
" <td>...</td>\n",
" <td>3.325322</td>\n",
" <td>1.527117</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.000000</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.426467</td>\n",
" <td>0.057769</td>\n",
" <td>0.182945</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1.0000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.106748</td>\n",
" <td>0.450443</td>\n",
" <td>1.520679</td>\n",
" <td>1.379075</td>\n",
" <td>0.155245</td>\n",
" <td>0.202628</td>\n",
" <td>-1.038057</td>\n",
" <td>1.124406</td>\n",
" <td>2.170128</td>\n",
" <td>1.539294</td>\n",
" <td>0.728741</td>\n",
" <td>0.360442</td>\n",
" <td>-0.346614</td>\n",
" <td>...</td>\n",
" <td>3.325322</td>\n",
" <td>1.527117</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.000000</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.426467</td>\n",
" <td>0.057769</td>\n",
" <td>0.182945</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1.0000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0.106748</td>\n",
" <td>0.450443</td>\n",
" <td>1.520679</td>\n",
" <td>1.379075</td>\n",
" <td>0.155245</td>\n",
" <td>0.202628</td>\n",
" <td>-1.038057</td>\n",
" <td>1.124406</td>\n",
" <td>2.170128</td>\n",
" <td>1.539294</td>\n",
" <td>0.728741</td>\n",
" <td>0.360442</td>\n",
" <td>-0.346614</td>\n",
" <td>...</td>\n",
" <td>3.325322</td>\n",
" <td>1.527117</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.000000</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.426467</td>\n",
" <td>0.057769</td>\n",
" <td>0.182945</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1.0000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>-0.418835</td>\n",
" <td>-0.403906</td>\n",
" <td>-0.605768</td>\n",
" <td>-0.579217</td>\n",
" <td>-0.390931</td>\n",
" <td>-0.360818</td>\n",
" <td>0.181706</td>\n",
" <td>0.458881</td>\n",
" <td>-0.596667</td>\n",
" <td>-0.768006</td>\n",
" <td>-0.433425</td>\n",
" <td>-0.353027</td>\n",
" <td>-0.346614</td>\n",
" <td>...</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.113813</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.768278</td>\n",
" <td>0.750994</td>\n",
" <td>0.558415</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>0.6786</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>-0.418835</td>\n",
" <td>-0.403906</td>\n",
" <td>-0.605768</td>\n",
" <td>-0.579217</td>\n",
" <td>-0.390931</td>\n",
" <td>-0.360818</td>\n",
" <td>0.181706</td>\n",
" <td>0.458881</td>\n",
" <td>-0.596667</td>\n",
" <td>-0.768006</td>\n",
" <td>-0.433425</td>\n",
" <td>-0.353027</td>\n",
" <td>-0.346614</td>\n",
" <td>...</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.113813</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.768278</td>\n",
" <td>0.750994</td>\n",
" <td>0.558415</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>0.6786</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 43 columns</p>\n",
"</div>"
],
"text/plain": [
" mean_assists mean_boosts mean_damageDealt mean_DBNOs \\\n",
"0 0.106748 0.450443 1.520679 1.379075 \n",
"1 0.106748 0.450443 1.520679 1.379075 \n",
"2 0.106748 0.450443 1.520679 1.379075 \n",
"3 -0.418835 -0.403906 -0.605768 -0.579217 \n",
"4 -0.418835 -0.403906 -0.605768 -0.579217 \n",
"\n",
" mean_headshotKills mean_heals mean_killPlace mean_killPoints \\\n",
"0 0.155245 0.202628 -1.038057 1.124406 \n",
"1 0.155245 0.202628 -1.038057 1.124406 \n",
"2 0.155245 0.202628 -1.038057 1.124406 \n",
"3 -0.390931 -0.360818 0.181706 0.458881 \n",
"4 -0.390931 -0.360818 0.181706 0.458881 \n",
"\n",
" mean_kills mean_killStreaks mean_longestKill mean_revives \\\n",
"0 2.170128 1.539294 0.728741 0.360442 \n",
"1 2.170128 1.539294 0.728741 0.360442 \n",
"2 2.170128 1.539294 0.728741 0.360442 \n",
"3 -0.596667 -0.768006 -0.433425 -0.353027 \n",
"4 -0.596667 -0.768006 -0.433425 -0.353027 \n",
"\n",
" mean_rideDistance ... var_longestKill var_revives \\\n",
"0 -0.346614 ... 3.325322 1.527117 \n",
"1 -0.346614 ... 3.325322 1.527117 \n",
"2 -0.346614 ... 3.325322 1.527117 \n",
"3 -0.346614 ... 0.000000 0.000000 \n",
"4 -0.346614 ... 0.000000 0.000000 \n",
"\n",
" var_rideDistance var_roadKills var_swimDistance var_teamKills \\\n",
"0 0.0 0.0 0.000000 0.0 \n",
"1 0.0 0.0 0.000000 0.0 \n",
"2 0.0 0.0 0.000000 0.0 \n",
"3 0.0 0.0 0.113813 0.0 \n",
"4 0.0 0.0 0.113813 0.0 \n",
"\n",
" var_vehicleDestroys var_walkDistance var_weaponsAcquired var_winPoints \\\n",
"0 0.0 0.426467 0.057769 0.182945 \n",
"1 0.0 0.426467 0.057769 0.182945 \n",
"2 0.0 0.426467 0.057769 0.182945 \n",
"3 0.0 0.768278 0.750994 0.558415 \n",
"4 0.0 0.768278 0.750994 0.558415 \n",
"\n",
" matchId groupId winPlacePerc \n",
"0 0 1 1.0000 \n",
"1 0 1 1.0000 \n",
"2 0 1 1.0000 \n",
"3 0 2 0.6786 \n",
"4 0 2 0.6786 \n",
"\n",
"[5 rows x 43 columns]"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_train_team_mean = df_train_normed[features + ['groupId']].groupby('groupId').mean().add_prefix('mean_')\n",
"df_train_team_var = df_train_normed[features + ['groupId']].groupby('groupId').var().add_prefix('var_').fillna(0)\n",
"df_train = pd.merge(left=df_train_team_mean, left_index=True, right=df_train_team_var, right_index=True)\n",
"df_train = df_train.merge(left_index=True, right=df_train_normed[['matchId', 'groupId', target]], right_on='groupId')\n",
"df_train.reset_index(inplace=True, drop=True)\n",
"\n",
"features = [column for column in df_train.columns if any([old_column in column for old_column in features])]\n",
"del df_train_team_mean, df_train_team_var, df_train_normed\n",
"\n",
"df_train.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## A bit of mathematic modelisation\n",
"Let us now take a bit of time to understand what we want to do with this data.\n",
"\n",
"Let a match $m \\in \\mathcal{M}$ and the subset of teams participating to this match $\\mathcal{G}_m$.\n",
"\n",
"Ranking the groups can be viewed as sorting them. Most sorting algorithm (*NO, NOT YOU BOGO SORT!*) would need a comparison function indicating for two given items $(a, b)$ whether $a < b$, $a = b$ or $a > b$. This is usually represented by $-1$, $0$ or $1$.\n",
"\n",
"In our case, we build a matrix $P^m$ so that $P^m_{ij} = \\mathbb{P}(g_i > g_j)$. This is the probability that the group $i$ is ranked higher than the group $j$ in match $m$.\n",
"\n",
"If we do so, we can then estimate those probabilities, *predicting* a matrix $\\hat{P^m}$. How do you ask? Thanks to a neural network of course! Howerver this is for later! :D\n",
"\n",
"Once we have said matrix, we can sort the groups. We will also see how a bit later."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Back to our data\n",
"We will now define a [`keras.utils.Sequence`](https://keras.io/utils/#sequence) class in order to allow the `fit` method of the models to ingest our data in a \"pairwise ranking\" way.\n",
"\n",
"Basically, for every match, we need to provide all pairs $(x_i, x_j)$ and their propability $P^m_{ij}$ which is 1 if `winPlacePerc` of $i$ is greater than that of $j$, 0 if it is the opposite and 0.5 if $i=j$."
]
},
{
"cell_type": "code",
"execution_count": 145,
"metadata": {},
"outputs": [],
"source": [
"class PairwiseDataGenerator(keras.utils.Sequence):\n",
" def __init__(self, df, batch_match=False):\n",
" self.df = df\n",
" self.batch_match = batch_match\n",
" # compute the number of cases\n",
" self.matchs_id = np.unique(self.df['matchId'].sort_values().values)\n",
" self.matchs_teams_count = self.df.groupby('matchId').size().sort_index().values\n",
" self.matchs_pairs_count = [mtc**2 for mtc in self.matchs_teams_count]\n",
" self.pairs_count = sum(self.matchs_pairs_count) if not batch_match else len(self.matchs_id)\n",
" \n",
" def __len__(self):\n",
" return self.pairs_count if not self.batch_match else len(self.matchs_id)\n",
" \n",
" def _get_df_match(self, index):\n",
" accumulated_match_pair_count = 0\n",
" for match_id, match_pair_count in zip(self.matchs_id, self.matchs_pairs_count):\n",
" if accumulated_match_pair_count <= index and index < accumulated_match_pair_count + match_pair_count:\n",
" df_match = self.df[self.df['matchId'] == match_id]\n",
" pair_index = index - accumulated_match_pair_count\n",
" return df_match, pair_index\n",
" accumulated_match_pair_count += match_pair_count\n",
" raise Exception(\"Index out of bounds in PairSequence._get_df_match (wtf though?!)\")\n",
" \n",
" def _get_pair_data(self, index):\n",
" df_match, pair_index = self._get_df_match(index)\n",
" i = pair_index // len(df_match)\n",
" j = pair_index % len(df_match)\n",
" data_i, data_j = df_match.iloc[i], df_match.iloc[j]\n",
" x_i = data_i[features].values\n",
" x_j = data_j[features].values\n",
" y_ij = 0.5 if i == j else 1 if data_i[target] > data_j[target] else 0\n",
" return {'input_left': x_i, 'input_right': x_j}, y_ij\n",
" \n",
" def _get_batch_pair_data(self, index):\n",
" x, y_ij = self._get_pair_data(index)\n",
" x_i, x_j = x['input_left'], x['input_right']\n",
" return {'input_left': x_i[np.newaxis, ...], 'input_right': x_j[np.newaxis, ...]}, np.array([y_ij])\n",
" \n",
" def _get_batch_match_data(self, match_index):\n",
" match_id = self.matchs_id[match_index]\n",
" generator_start_index = sum(self.matchs_pairs_count[:match_index])\n",
" input_left, input_right, y = [], [], []\n",
" for pair_index in range(self.matchs_pairs_count[match_index]):\n",
" index = pair_index + generator_start_index\n",
" input_dict_i, y_i = self._get_pair_data(index)\n",
" input_left.append(input_dict_i['input_left'])\n",
" input_right.append(input_dict_i['input_right'])\n",
" y.append(y_i)\n",
" return {'input_left': np.array(input_left), 'input_right': np.array(input_right)}, np.array(y)\n",
" \n",
" def __getitem__(self, index):\n",
" if index < 0 or index > len(self):\n",
" raise Exception(\"Out of bounds exception: index must be comprised between 0 and {}\".format(len(self)))\n",
" if self.batch_match:\n",
" return self._get_batch_match_data(index)\n",
" else:\n",
" return self._get_batch_pair_data(index)"
]
},
{
"cell_type": "code",
"execution_count": 146,
"metadata": {},
"outputs": [],
"source": [
"matchs_id = np.unique(df_train['matchId'].values)\n",
"matchs_id_train = matchs_id[:100]\n",
"matchs_id_valid = matchs_id[100:110]"
]
},
{
"cell_type": "code",
"execution_count": 147,
"metadata": {},
"outputs": [],
"source": [
"data_generator_train = PairwiseDataGenerator(df_train[df_train['matchId'].isin(matchs_id_train)])\n",
"data_generator_valid = PairwiseDataGenerator(df_train[df_train['matchId'].isin(matchs_id_valid)])"
]
},
{
"cell_type": "code",
"execution_count": 148,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Training samples: 852944\n",
"Validation samples: 60927\n"
]
}
],
"source": [
"print(\"Training samples:\", len(data_generator_train))\n",
"print(\"Validation samples:\", len(data_generator_valid))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As we can see we have quite a lot of comparisons for training!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 2. Baking a model\n",
"Once we have this giant list of comparisons, we can build the model we want to train !\n",
"We will here use [RankNet](https://icml.cc/2015/wp-content/uploads/2015/06/icml_ranking.pdf), the easiest one.\n",
"\n",
"The idea of RankNet is to compute a score for both inputs using the same Neural Network, compute the difference between those scores and then pass it in a sigmoid. Cool thing about the sigmoid function is that $f(-x) + f(x) = 1$, so we will keep the condition $P^m_{ij} + P^m_{ji} = 1$ unbroken, which is important when playing around saying things are probabilities!\n",
"\n",
"Our implementation of the RankNet is a dervation of [this implementation on GitHub](https://github.com/airalcorn2/RankNet/blob/master/ranknet.py)."
]
},
{
"cell_type": "code",
"execution_count": 149,
"metadata": {},
"outputs": [],
"source": [
"INPUT_DIM = len(features)\n",
"\n",
"# Model\n",
"h_1 = keras.layers.Dense(256, activation=\"relu\")\n",
"h_2 = keras.layers.Dense(128, activation=\"relu\")\n",
"h_3 = keras.layers.Dense(32, activation=\"relu\")\n",
"s = keras.layers.Dense(1)\n",
"\n",
"# Left score\n",
"input_left = keras.layers.Input(shape=(INPUT_DIM,), dtype=\"float32\", name='input_left')\n",
"h_1_left = h_1(input_left)\n",
"h_2_left = h_2(h_1_left)\n",
"h_3_left = h_3(h_2_left)\n",
"score_left = s(h_3_left)\n",
"\n",
"# Right score\n",
"input_right = keras.layers.Input(shape=(INPUT_DIM,), dtype=\"float32\", name='input_right')\n",
"h_1_right = h_1(input_right)\n",
"h_2_right = h_2(h_1_right)\n",
"h_3_right = h_3(h_2_right)\n",
"score_right = s(h_3_right)\n",
"\n",
"# Subtract scores\n",
"diff = keras.layers.Subtract()([score_left, score_right])\n",
"\n",
"# Pass difference through sigmoid function\n",
"prob = keras.layers.Activation(\"sigmoid\")(diff)\n",
"\n",
"# Build model\n",
"model = keras.models.Model(inputs = [input_left, input_right], outputs=prob)\n",
"model.compile(optimizer=\"adadelta\", loss=\"binary_crossentropy\")"
]
},
{
"cell_type": "code",
"execution_count": 150,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"__________________________________________________________________________________________________\n",
"Layer (type) Output Shape Param # Connected to \n",
"==================================================================================================\n",
"input_left (InputLayer) (None, 40) 0 \n",
"__________________________________________________________________________________________________\n",
"input_right (InputLayer) (None, 40) 0 \n",
"__________________________________________________________________________________________________\n",
"dense_25 (Dense) (None, 256) 10496 input_left[0][0] \n",
" input_right[0][0] \n",
"__________________________________________________________________________________________________\n",
"dense_26 (Dense) (None, 128) 32896 dense_25[0][0] \n",
" dense_25[1][0] \n",
"__________________________________________________________________________________________________\n",
"dense_27 (Dense) (None, 32) 4128 dense_26[0][0] \n",
" dense_26[1][0] \n",
"__________________________________________________________________________________________________\n",
"dense_28 (Dense) (None, 1) 33 dense_27[0][0] \n",
" dense_27[1][0] \n",
"__________________________________________________________________________________________________\n",
"subtract_7 (Subtract) (None, 1) 0 dense_28[0][0] \n",
" dense_28[1][0] \n",
"__________________________________________________________________________________________________\n",
"activation_7 (Activation) (None, 1) 0 subtract_7[0][0] \n",
"==================================================================================================\n",
"Total params: 47,553\n",
"Trainable params: 47,553\n",
"Non-trainable params: 0\n",
"__________________________________________________________________________________________________\n"
]
}
],
"source": [
"model.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let us train this network."
]
},
{
"cell_type": "code",
"execution_count": 174,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Epoch 1/10\n",
" 186/10000 [..............................] - ETA: 3:20 - loss: 0.1608"
]
},
{
"ename": "KeyboardInterrupt",
"evalue": "",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mKeyboardInterrupt\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-174-fccb50df84f2>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m model.fit_generator(generator=data_generator_train, steps_per_epoch=10000, epochs=10, shuffle=True, \n\u001b[0;32m----> 2\u001b[0;31m validation_data=data_generator_valid, validation_steps=100)\n\u001b[0m",
"\u001b[0;32m/usr/lib/python3.7/site-packages/keras/legacy/interfaces.py\u001b[0m in \u001b[0;36mwrapper\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 89\u001b[0m warnings.warn('Update your `' + object_name + '` call to the ' +\n\u001b[1;32m 90\u001b[0m 'Keras 2 API: ' + signature, stacklevel=2)\n\u001b[0;32m---> 91\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mfunc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 92\u001b[0m \u001b[0mwrapper\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_original_function\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mfunc\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 93\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mwrapper\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/usr/lib/python3.7/site-packages/keras/engine/training.py\u001b[0m in \u001b[0;36mfit_generator\u001b[0;34m(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)\u001b[0m\n\u001b[1;32m 1416\u001b[0m \u001b[0muse_multiprocessing\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0muse_multiprocessing\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1417\u001b[0m \u001b[0mshuffle\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mshuffle\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1418\u001b[0;31m initial_epoch=initial_epoch)\n\u001b[0m\u001b[1;32m 1419\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1420\u001b[0m \u001b[0;34m@\u001b[0m\u001b[0minterfaces\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlegacy_generator_methods_support\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/usr/lib/python3.7/site-packages/keras/engine/training_generator.py\u001b[0m in \u001b[0;36mfit_generator\u001b[0;34m(model, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)\u001b[0m\n\u001b[1;32m 215\u001b[0m outs = model.train_on_batch(x, y,\n\u001b[1;32m 216\u001b[0m \u001b[0msample_weight\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0msample_weight\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 217\u001b[0;31m class_weight=class_weight)\n\u001b[0m\u001b[1;32m 218\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 219\u001b[0m \u001b[0mouts\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mto_list\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mouts\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/usr/lib/python3.7/site-packages/keras/engine/training.py\u001b[0m in \u001b[0;36mtrain_on_batch\u001b[0;34m(self, x, y, sample_weight, class_weight)\u001b[0m\n\u001b[1;32m 1215\u001b[0m \u001b[0mins\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mx\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0my\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0msample_weights\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1216\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_make_train_function\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1217\u001b[0;31m \u001b[0moutputs\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtrain_function\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mins\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1218\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0munpack_singleton\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0moutputs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1219\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/usr/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py\u001b[0m in \u001b[0;36m__call__\u001b[0;34m(self, inputs)\u001b[0m\n\u001b[1;32m 2713\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_legacy_call\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0minputs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2714\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2715\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_call\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0minputs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2716\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2717\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mpy_any\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mis_tensor\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mx\u001b[0m \u001b[0;32min\u001b[0m \u001b[0minputs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/usr/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py\u001b[0m in \u001b[0;36m_call\u001b[0;34m(self, inputs)\u001b[0m\n\u001b[1;32m 2673\u001b[0m \u001b[0mfetched\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_callable_fn\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0marray_vals\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mrun_metadata\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrun_metadata\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2674\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2675\u001b[0;31m \u001b[0mfetched\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_callable_fn\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0marray_vals\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2676\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mfetched\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0moutputs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2677\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/usr/lib/python3.7/site-packages/tensorflow/python/client/session.py\u001b[0m in \u001b[0;36m__call__\u001b[0;34m(self, *args, **kwargs)\u001b[0m\n\u001b[1;32m 1397\u001b[0m ret = tf_session.TF_SessionRunCallable(\n\u001b[1;32m 1398\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_session\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_session\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_handle\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mstatus\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1399\u001b[0;31m run_metadata_ptr)\n\u001b[0m\u001b[1;32m 1400\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mrun_metadata\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1401\u001b[0m \u001b[0mproto_data\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtf_session\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mTF_GetBuffer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mrun_metadata_ptr\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mKeyboardInterrupt\u001b[0m: "
]
}
],
"source": [
"model.fit_generator(generator=data_generator_train, steps_per_epoch=10000, epochs=10, shuffle=True, \n",
" validation_data=data_generator_valid, validation_steps=100)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# # Our way of building a batch per match is not fast enough\n",
"# data_generator_batchmatch_train = PairwiseDataGenerator(df_train[df_train['matchId'].isin(matchs_id_train)], batch_match=True)\n",
"# data_generator_batchmatch_valid = PairwiseDataGenerator(df_train[df_train['matchId'].isin(matchs_id_valid)], batch_match=True)\n",
"# model.fit_generator(generator=data_generator_batchmatch_train, steps_per_epoch=100, epochs=1, shuffle=True, validation_data=data_generator_batchmatch_valid, validation_steps=3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 3. Predicting the pairwise matrix\n",
"\n",
"Now that we have trained our neural network, let's check that it really fits what we want, which is the probability that a team $i$ has a better rank than another team $j$."
]
},
{
"cell_type": "code",
"execution_count": 176,
"metadata": {},
"outputs": [],
"source": [
"def get_match_truth_pairwise_matrix(pairwise_data_generator, match_id):\n",
" match_index = np.argwhere(pairwise_data_generator.matchs_id == match_id)[0, 0]\n",
" input_dict, y_true = pairwise_data_generator._get_batch_match_data(match_index)\n",
" n_groups = int(np.sqrt(len(y_true)))\n",
" return y_true.reshape((n_groups, n_groups))"
]
},
{
"cell_type": "code",
"execution_count": 177,
"metadata": {},
"outputs": [],
"source": [
"def plot_pairwise_probability(pairwise_probability_matrix):\n",
" plt.imshow(pairwise_probability_matrix, cmap='Blues')"
]
},
{
"cell_type": "code",
"execution_count": 178,
"metadata": {},
"outputs": [],
"source": [
"def get_match_predicted_pairwise_matrix(model, pairwise_data_generator, match_id):\n",
" match_index = np.argwhere(pairwise_data_generator.matchs_id == match_id)[0, 0]\n",
" input_dict, y_true = pairwise_data_generator._get_batch_match_data(match_index)\n",
" y_pred = model.predict(input_dict)\n",
" n_groups = int(np.sqrt(len(y_pred)))\n",
" return y_pred.reshape((n_groups, n_groups))"
]
},
{
"cell_type": "code",
"execution_count": 179,
"metadata": {},
"outputs": [],
"source": [
"def plot_match_prediction_comparision(model, pairwise_data_generator, match_id):\n",
" plt.suptitle(\"Match {}\".format(match_id))\n",
" plt.subplot(121)\n",
" plt.title(\"Truth\")\n",
" plot_pairwise_probability(get_match_truth_pairwise_matrix(pairwise_data_generator, match_id))\n",
" plt.subplot(122)\n",
" plt.title(\"Prediction\")\n",
" plot_pairwise_probability(get_match_predicted_pairwise_matrix(model, pairwise_data_generator, match_id))"
]
},
{
"cell_type": "code",
"execution_count": 180,
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 1080x504 with 2 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.figure(figsize=(15, 7))\n",
"plot_match_prediction_comparision(model, data_generator_valid, 106)"
]
},
{
"cell_type": "code",
"execution_count": 181,
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 1080x504 with 2 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.figure(figsize=(15, 7))\n",
"plot_match_prediction_comparision(model, data_generator_valid, 107)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can clearly see that the neural network prediction ressembles the truth. *Yay !*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 4. Predicting the match ranking\n",
"\n",
"Ok, but how do we go from the pairwise ranking probabilities to an actual ranking?\n",
"\n",
"Beause we have such probabilities, we could in theory compute the most likely ranking!"
]
},
{
"cell_type": "code",
"execution_count": 182,
"metadata": {},
"outputs": [],
"source": [
"# TODO"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Thanks for ready, I hope you enjoyed this trip, and see you around !"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 - GPU",
"language": "python",
"name": "python-gpu"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.0"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment