Skip to content

Instantly share code, notes, and snippets.

@durgaswaroop
Last active May 19, 2019 12:37
Show Gist options
  • Save durgaswaroop/9e54e152465d4b074a36e2d5199af606 to your computer and use it in GitHub Desktop.
Save durgaswaroop/9e54e152465d4b074a36e2d5199af606 to your computer and use it in GitHub Desktop.
Pandas Groupby Recipes
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-19T10:42:50.867010Z",
"start_time": "2019-05-19T10:42:44.791103Z"
}
},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np"
]
},
{
"cell_type": "code",
"execution_count": 93,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-19T12:35:12.180334Z",
"start_time": "2019-05-19T12:35:11.704109Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Name</th>\n",
" <th>Team</th>\n",
" <th>Number</th>\n",
" <th>Position</th>\n",
" <th>Age</th>\n",
" <th>Height</th>\n",
" <th>Weight</th>\n",
" <th>College</th>\n",
" <th>Salary</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Avery Bradley</td>\n",
" <td>Boston Celtics</td>\n",
" <td>0.0</td>\n",
" <td>PG</td>\n",
" <td>25.0</td>\n",
" <td>6-2</td>\n",
" <td>180.0</td>\n",
" <td>Texas</td>\n",
" <td>7730337.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jae Crowder</td>\n",
" <td>Boston Celtics</td>\n",
" <td>99.0</td>\n",
" <td>SF</td>\n",
" <td>25.0</td>\n",
" <td>6-6</td>\n",
" <td>235.0</td>\n",
" <td>Marquette</td>\n",
" <td>6796117.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>John Holland</td>\n",
" <td>Boston Celtics</td>\n",
" <td>30.0</td>\n",
" <td>SG</td>\n",
" <td>27.0</td>\n",
" <td>6-5</td>\n",
" <td>205.0</td>\n",
" <td>Boston University</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>R.J. Hunter</td>\n",
" <td>Boston Celtics</td>\n",
" <td>28.0</td>\n",
" <td>SG</td>\n",
" <td>22.0</td>\n",
" <td>6-5</td>\n",
" <td>185.0</td>\n",
" <td>Georgia State</td>\n",
" <td>1148640.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Jonas Jerebko</td>\n",
" <td>Boston Celtics</td>\n",
" <td>8.0</td>\n",
" <td>PF</td>\n",
" <td>29.0</td>\n",
" <td>6-10</td>\n",
" <td>231.0</td>\n",
" <td>NaN</td>\n",
" <td>5000000.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Name Team Number Position Age Height Weight \\\n",
"0 Avery Bradley Boston Celtics 0.0 PG 25.0 6-2 180.0 \n",
"1 Jae Crowder Boston Celtics 99.0 SF 25.0 6-6 235.0 \n",
"2 John Holland Boston Celtics 30.0 SG 27.0 6-5 205.0 \n",
"3 R.J. Hunter Boston Celtics 28.0 SG 22.0 6-5 185.0 \n",
"4 Jonas Jerebko Boston Celtics 8.0 PF 29.0 6-10 231.0 \n",
"\n",
" College Salary \n",
"0 Texas 7730337.0 \n",
"1 Marquette 6796117.0 \n",
"2 Boston University NaN \n",
"3 Georgia State 1148640.0 \n",
"4 NaN 5000000.0 "
]
},
"execution_count": 93,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"nba_df = pd.read_csv('https://cdncontribute.geeksforgeeks.org/wp-content/uploads/nba.csv') \n",
"nba_df.head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### What is the average age of all players?"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-19T10:48:01.176724Z",
"start_time": "2019-05-19T10:48:01.172126Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"26.94"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"round(nba_df.Age.mean(), 2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### What is the average age of players by team?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Groupby on what ever column(s) (`Team`) you want to be grouped and then the column on which you want to compute the mean on (`Age`), comes after."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-19T10:48:36.004229Z",
"start_time": "2019-05-19T10:48:35.996288Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"<pandas.core.groupby.groupby.DataFrameGroupBy object at 0x7fe02d82c0b8>"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"team_grouped = nba_df.groupby('Team')\n",
"team_grouped"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Calling `mean()` on that gives you the average calcualted by each team. But we get a series object back. "
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-19T10:54:42.275627Z",
"start_time": "2019-05-19T10:54:42.266306Z"
},
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"Team\n",
"Atlanta Hawks 28.200000\n",
"Boston Celtics 24.733333\n",
"Brooklyn Nets 25.600000\n",
"Charlotte Hornets 26.133333\n",
"Chicago Bulls 27.400000\n",
"Cleveland Cavaliers 29.533333\n",
"Dallas Mavericks 29.733333\n",
"Denver Nuggets 25.733333\n",
"Detroit Pistons 26.200000\n",
"Golden State Warriors 27.666667\n",
"Houston Rockets 26.866667\n",
"Indiana Pacers 26.400000\n",
"Los Angeles Clippers 29.466667\n",
"Los Angeles Lakers 27.533333\n",
"Memphis Grizzlies 28.388889\n",
"Miami Heat 28.933333\n",
"Milwaukee Bucks 24.562500\n",
"Minnesota Timberwolves 26.357143\n",
"New Orleans Pelicans 26.894737\n",
"New York Knicks 27.000000\n",
"Oklahoma City Thunder 27.066667\n",
"Orlando Magic 25.071429\n",
"Philadelphia 76ers 24.600000\n",
"Phoenix Suns 25.866667\n",
"Portland Trail Blazers 25.066667\n",
"Sacramento Kings 26.800000\n",
"San Antonio Spurs 31.600000\n",
"Toronto Raptors 26.133333\n",
"Utah Jazz 24.466667\n",
"Washington Wizards 27.866667\n",
"Name: Age, dtype: float64"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"team_grouped['Age'].mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To convert into a dataframe, we will do `reset_index()` on it. It gives the dataframe with column names."
]
},
{
"cell_type": "code",
"execution_count": 95,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-19T12:36:19.979913Z",
"start_time": "2019-05-19T12:36:19.966995Z"
},
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Team</th>\n",
" <th>Age</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Atlanta Hawks</td>\n",
" <td>28.200000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Boston Celtics</td>\n",
" <td>24.733333</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Brooklyn Nets</td>\n",
" <td>25.600000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Charlotte Hornets</td>\n",
" <td>26.133333</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Chicago Bulls</td>\n",
" <td>27.400000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Cleveland Cavaliers</td>\n",
" <td>29.533333</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Dallas Mavericks</td>\n",
" <td>29.733333</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Denver Nuggets</td>\n",
" <td>25.733333</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>Detroit Pistons</td>\n",
" <td>26.200000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>Golden State Warriors</td>\n",
" <td>27.666667</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Team Age\n",
"0 Atlanta Hawks 28.200000\n",
"1 Boston Celtics 24.733333\n",
"2 Brooklyn Nets 25.600000\n",
"3 Charlotte Hornets 26.133333\n",
"4 Chicago Bulls 27.400000\n",
"5 Cleveland Cavaliers 29.533333\n",
"6 Dallas Mavericks 29.733333\n",
"7 Denver Nuggets 25.733333\n",
"8 Detroit Pistons 26.200000\n",
"9 Golden State Warriors 27.666667"
]
},
"execution_count": 95,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"team_grouped['Age'].mean().reset_index().head(10)"
]
},
{
"cell_type": "markdown",
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-19T11:05:09.590778Z",
"start_time": "2019-05-19T11:05:09.582593Z"
}
},
"source": [
"If we want to apply a function to each value of the aggregated output.."
]
},
{
"cell_type": "code",
"execution_count": 96,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-19T12:36:25.161700Z",
"start_time": "2019-05-19T12:36:25.150578Z"
},
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Team</th>\n",
" <th>Age</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Atlanta Hawks</td>\n",
" <td>28.20</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Boston Celtics</td>\n",
" <td>24.73</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Brooklyn Nets</td>\n",
" <td>25.60</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Charlotte Hornets</td>\n",
" <td>26.13</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Chicago Bulls</td>\n",
" <td>27.40</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Cleveland Cavaliers</td>\n",
" <td>29.53</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Dallas Mavericks</td>\n",
" <td>29.73</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Denver Nuggets</td>\n",
" <td>25.73</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>Detroit Pistons</td>\n",
" <td>26.20</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>Golden State Warriors</td>\n",
" <td>27.67</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Team Age\n",
"0 Atlanta Hawks 28.20\n",
"1 Boston Celtics 24.73\n",
"2 Brooklyn Nets 25.60\n",
"3 Charlotte Hornets 26.13\n",
"4 Chicago Bulls 27.40\n",
"5 Cleveland Cavaliers 29.53\n",
"6 Dallas Mavericks 29.73\n",
"7 Denver Nuggets 25.73\n",
"8 Detroit Pistons 26.20\n",
"9 Golden State Warriors 27.67"
]
},
"execution_count": 96,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"team_grouped['Age'].mean().apply(lambda x: round(x,2)).reset_index().head(10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### What is the average salary of the players?"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-19T11:32:50.076514Z",
"start_time": "2019-05-19T11:32:50.071355Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"4842684.11"
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"round(nba_df.Salary.mean(), 2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### What is the average salary of the players grouped by Team?"
]
},
{
"cell_type": "code",
"execution_count": 97,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-19T12:36:31.027057Z",
"start_time": "2019-05-19T12:36:31.011457Z"
},
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Team</th>\n",
" <th>Salary</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Atlanta Hawks</td>\n",
" <td>4860197</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Boston Celtics</td>\n",
" <td>4181505</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Brooklyn Nets</td>\n",
" <td>3501898</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Charlotte Hornets</td>\n",
" <td>5222728</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Chicago Bulls</td>\n",
" <td>5785559</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Cleveland Cavaliers</td>\n",
" <td>7642049</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Dallas Mavericks</td>\n",
" <td>4746582</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Denver Nuggets</td>\n",
" <td>4294424</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>Detroit Pistons</td>\n",
" <td>4477884</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>Golden State Warriors</td>\n",
" <td>5924600</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Team Salary\n",
"0 Atlanta Hawks 4860197\n",
"1 Boston Celtics 4181505\n",
"2 Brooklyn Nets 3501898\n",
"3 Charlotte Hornets 5222728\n",
"4 Chicago Bulls 5785559\n",
"5 Cleveland Cavaliers 7642049\n",
"6 Dallas Mavericks 4746582\n",
"7 Denver Nuggets 4294424\n",
"8 Detroit Pistons 4477884\n",
"9 Golden State Warriors 5924600"
]
},
"execution_count": 97,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"nba_df.groupby('Team')['Salary'].mean().apply(round).reset_index().head(10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### What is the average age by Team and Position?"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-19T11:47:35.709726Z",
"start_time": "2019-05-19T11:47:35.704544Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"<pandas.core.groupby.groupby.DataFrameGroupBy object at 0x7fe02d83ed68>"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"team_position_grouped = nba_df.groupby(['Team', 'Position'])\n",
"team_position_grouped"
]
},
{
"cell_type": "code",
"execution_count": 98,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-19T12:36:34.982818Z",
"start_time": "2019-05-19T12:36:34.970304Z"
},
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Team</th>\n",
" <th>Position</th>\n",
" <th>Age</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Atlanta Hawks</td>\n",
" <td>C</td>\n",
" <td>28.333333</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Atlanta Hawks</td>\n",
" <td>PF</td>\n",
" <td>28.250000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Atlanta Hawks</td>\n",
" <td>PG</td>\n",
" <td>24.500000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Atlanta Hawks</td>\n",
" <td>SF</td>\n",
" <td>29.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Atlanta Hawks</td>\n",
" <td>SG</td>\n",
" <td>29.500000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Boston Celtics</td>\n",
" <td>C</td>\n",
" <td>25.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Boston Celtics</td>\n",
" <td>PF</td>\n",
" <td>26.333333</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Boston Celtics</td>\n",
" <td>PG</td>\n",
" <td>24.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>Boston Celtics</td>\n",
" <td>SF</td>\n",
" <td>25.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>Boston Celtics</td>\n",
" <td>SG</td>\n",
" <td>24.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Team Position Age\n",
"0 Atlanta Hawks C 28.333333\n",
"1 Atlanta Hawks PF 28.250000\n",
"2 Atlanta Hawks PG 24.500000\n",
"3 Atlanta Hawks SF 29.000000\n",
"4 Atlanta Hawks SG 29.500000\n",
"5 Boston Celtics C 25.000000\n",
"6 Boston Celtics PF 26.333333\n",
"7 Boston Celtics PG 24.000000\n",
"8 Boston Celtics SF 25.000000\n",
"9 Boston Celtics SG 24.000000"
]
},
"execution_count": 98,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"team_position_grouped['Age'].mean().reset_index().head(10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If we want to apply a function to the column, its similarly done."
]
},
{
"cell_type": "code",
"execution_count": 99,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-19T12:36:37.512883Z",
"start_time": "2019-05-19T12:36:37.498299Z"
},
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Team</th>\n",
" <th>Position</th>\n",
" <th>Age</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Atlanta Hawks</td>\n",
" <td>C</td>\n",
" <td>28</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Atlanta Hawks</td>\n",
" <td>PF</td>\n",
" <td>28</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Atlanta Hawks</td>\n",
" <td>PG</td>\n",
" <td>24</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Atlanta Hawks</td>\n",
" <td>SF</td>\n",
" <td>29</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Atlanta Hawks</td>\n",
" <td>SG</td>\n",
" <td>30</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Boston Celtics</td>\n",
" <td>C</td>\n",
" <td>25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Boston Celtics</td>\n",
" <td>PF</td>\n",
" <td>26</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Boston Celtics</td>\n",
" <td>PG</td>\n",
" <td>24</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>Boston Celtics</td>\n",
" <td>SF</td>\n",
" <td>25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>Boston Celtics</td>\n",
" <td>SG</td>\n",
" <td>24</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Team Position Age\n",
"0 Atlanta Hawks C 28\n",
"1 Atlanta Hawks PF 28\n",
"2 Atlanta Hawks PG 24\n",
"3 Atlanta Hawks SF 29\n",
"4 Atlanta Hawks SG 30\n",
"5 Boston Celtics C 25\n",
"6 Boston Celtics PF 26\n",
"7 Boston Celtics PG 24\n",
"8 Boston Celtics SF 25\n",
"9 Boston Celtics SG 24"
]
},
"execution_count": 99,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"team_position_grouped['Age'].mean().apply(round).reset_index().head(10)"
]
},
{
"cell_type": "markdown",
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-19T12:06:15.969683Z",
"start_time": "2019-05-19T12:06:15.966752Z"
}
},
"source": [
"### What is the average salary by Team, Position and Weight Class?"
]
},
{
"cell_type": "code",
"execution_count": 102,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-19T12:36:54.307893Z",
"start_time": "2019-05-19T12:36:54.283670Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Name</th>\n",
" <th>Team</th>\n",
" <th>Number</th>\n",
" <th>Position</th>\n",
" <th>Age</th>\n",
" <th>Height</th>\n",
" <th>Weight</th>\n",
" <th>College</th>\n",
" <th>Salary</th>\n",
" <th>Weight_Class</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Avery Bradley</td>\n",
" <td>Boston Celtics</td>\n",
" <td>0.0</td>\n",
" <td>PG</td>\n",
" <td>25.0</td>\n",
" <td>6-2</td>\n",
" <td>180.0</td>\n",
" <td>Texas</td>\n",
" <td>7730337.0</td>\n",
" <td>170-200</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jae Crowder</td>\n",
" <td>Boston Celtics</td>\n",
" <td>99.0</td>\n",
" <td>SF</td>\n",
" <td>25.0</td>\n",
" <td>6-6</td>\n",
" <td>235.0</td>\n",
" <td>Marquette</td>\n",
" <td>6796117.0</td>\n",
" <td>225-250</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>John Holland</td>\n",
" <td>Boston Celtics</td>\n",
" <td>30.0</td>\n",
" <td>SG</td>\n",
" <td>27.0</td>\n",
" <td>6-5</td>\n",
" <td>205.0</td>\n",
" <td>Boston University</td>\n",
" <td>NaN</td>\n",
" <td>200-225</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>R.J. Hunter</td>\n",
" <td>Boston Celtics</td>\n",
" <td>28.0</td>\n",
" <td>SG</td>\n",
" <td>22.0</td>\n",
" <td>6-5</td>\n",
" <td>185.0</td>\n",
" <td>Georgia State</td>\n",
" <td>1148640.0</td>\n",
" <td>170-200</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Jonas Jerebko</td>\n",
" <td>Boston Celtics</td>\n",
" <td>8.0</td>\n",
" <td>PF</td>\n",
" <td>29.0</td>\n",
" <td>6-10</td>\n",
" <td>231.0</td>\n",
" <td>NaN</td>\n",
" <td>5000000.0</td>\n",
" <td>225-250</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Name Team Number Position Age Height Weight \\\n",
"0 Avery Bradley Boston Celtics 0.0 PG 25.0 6-2 180.0 \n",
"1 Jae Crowder Boston Celtics 99.0 SF 25.0 6-6 235.0 \n",
"2 John Holland Boston Celtics 30.0 SG 27.0 6-5 205.0 \n",
"3 R.J. Hunter Boston Celtics 28.0 SG 22.0 6-5 185.0 \n",
"4 Jonas Jerebko Boston Celtics 8.0 PF 29.0 6-10 231.0 \n",
"\n",
" College Salary Weight_Class \n",
"0 Texas 7730337.0 170-200 \n",
"1 Marquette 6796117.0 225-250 \n",
"2 Boston University NaN 200-225 \n",
"3 Georgia State 1148640.0 170-200 \n",
"4 NaN 5000000.0 225-250 "
]
},
"execution_count": 102,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"weight_bins = [0, 170, 200, 225, 250, 300, 350]\n",
"weight_labels = ['<170', '170-200', '200-225', '225-250', '250-300', '>300']\n",
"nba_df['Weight_Class'] = pd.cut(nba_df.Weight, bins=weight_bins, labels=weight_labels)\n",
"nba_df.head()"
]
},
{
"cell_type": "code",
"execution_count": 104,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-19T12:37:03.578070Z",
"start_time": "2019-05-19T12:37:03.555746Z"
},
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Team</th>\n",
" <th>Position</th>\n",
" <th>Weight_Class</th>\n",
" <th>Salary</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Atlanta Hawks</td>\n",
" <td>C</td>\n",
" <td>225-250</td>\n",
" <td>1.087812e+07</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Atlanta Hawks</td>\n",
" <td>C</td>\n",
" <td>250-300</td>\n",
" <td>1.000000e+06</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Atlanta Hawks</td>\n",
" <td>PF</td>\n",
" <td>225-250</td>\n",
" <td>5.988067e+06</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Atlanta Hawks</td>\n",
" <td>PG</td>\n",
" <td>170-200</td>\n",
" <td>4.881700e+06</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Atlanta Hawks</td>\n",
" <td>SF</td>\n",
" <td>200-225</td>\n",
" <td>3.000000e+06</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Atlanta Hawks</td>\n",
" <td>SG</td>\n",
" <td>170-200</td>\n",
" <td>2.854940e+06</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Atlanta Hawks</td>\n",
" <td>SG</td>\n",
" <td>200-225</td>\n",
" <td>2.525364e+06</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Boston Celtics</td>\n",
" <td>C</td>\n",
" <td>225-250</td>\n",
" <td>2.165160e+06</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>Boston Celtics</td>\n",
" <td>C</td>\n",
" <td>250-300</td>\n",
" <td>2.593118e+06</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>Boston Celtics</td>\n",
" <td>PF</td>\n",
" <td>225-250</td>\n",
" <td>6.056987e+06</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Team Position Weight_Class Salary\n",
"0 Atlanta Hawks C 225-250 1.087812e+07\n",
"1 Atlanta Hawks C 250-300 1.000000e+06\n",
"2 Atlanta Hawks PF 225-250 5.988067e+06\n",
"3 Atlanta Hawks PG 170-200 4.881700e+06\n",
"4 Atlanta Hawks SF 200-225 3.000000e+06\n",
"5 Atlanta Hawks SG 170-200 2.854940e+06\n",
"6 Atlanta Hawks SG 200-225 2.525364e+06\n",
"7 Boston Celtics C 225-250 2.165160e+06\n",
"8 Boston Celtics C 250-300 2.593118e+06\n",
"9 Boston Celtics PF 225-250 6.056987e+06"
]
},
"execution_count": 104,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"nba_df.groupby(['Team', 'Position', 'Weight_Class'])['Salary'].mean().reset_index().head(10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Count the number of people in each team"
]
},
{
"cell_type": "markdown",
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-19T12:27:02.033122Z",
"start_time": "2019-05-19T12:27:02.025385Z"
}
},
"source": [
"For counting we have to actually use `size()` instead of `count()`."
]
},
{
"cell_type": "code",
"execution_count": 89,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-19T12:26:36.357627Z",
"start_time": "2019-05-19T12:26:36.348226Z"
},
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"Team\n",
"Atlanta Hawks 15\n",
"Boston Celtics 15\n",
"Brooklyn Nets 15\n",
"Charlotte Hornets 15\n",
"Chicago Bulls 15\n",
"Cleveland Cavaliers 15\n",
"Dallas Mavericks 15\n",
"Denver Nuggets 15\n",
"Detroit Pistons 15\n",
"Golden State Warriors 15\n",
"Houston Rockets 15\n",
"Indiana Pacers 15\n",
"Los Angeles Clippers 15\n",
"Los Angeles Lakers 15\n",
"Memphis Grizzlies 18\n",
"Miami Heat 15\n",
"Milwaukee Bucks 16\n",
"Minnesota Timberwolves 14\n",
"New Orleans Pelicans 19\n",
"New York Knicks 16\n",
"Oklahoma City Thunder 15\n",
"Orlando Magic 14\n",
"Philadelphia 76ers 15\n",
"Phoenix Suns 15\n",
"Portland Trail Blazers 15\n",
"Sacramento Kings 15\n",
"San Antonio Spurs 15\n",
"Toronto Raptors 15\n",
"Utah Jazz 15\n",
"Washington Wizards 15\n",
"dtype: int64"
]
},
"execution_count": 89,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"nba_df.groupby('Team').size()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"With `count` the output is quite different from what you expect"
]
},
{
"cell_type": "code",
"execution_count": 105,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-19T12:37:12.899403Z",
"start_time": "2019-05-19T12:37:12.879782Z"
},
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Name</th>\n",
" <th>Number</th>\n",
" <th>Position</th>\n",
" <th>Age</th>\n",
" <th>Height</th>\n",
" <th>Weight</th>\n",
" <th>College</th>\n",
" <th>Salary</th>\n",
" <th>Weight_Class</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Team</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Atlanta Hawks</th>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>11</td>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Boston Celtics</th>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>13</td>\n",
" <td>14</td>\n",
" <td>15</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Brooklyn Nets</th>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>13</td>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Charlotte Hornets</th>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>13</td>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Chicago Bulls</th>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>12</td>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Cleveland Cavaliers</th>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>12</td>\n",
" <td>14</td>\n",
" <td>15</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Dallas Mavericks</th>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>12</td>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Denver Nuggets</th>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>9</td>\n",
" <td>14</td>\n",
" <td>15</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Detroit Pistons</th>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Golden State Warriors</th>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" <td>12</td>\n",
" <td>15</td>\n",
" <td>15</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Name Number Position Age Height Weight College \\\n",
"Team \n",
"Atlanta Hawks 15 15 15 15 15 15 11 \n",
"Boston Celtics 15 15 15 15 15 15 13 \n",
"Brooklyn Nets 15 15 15 15 15 15 13 \n",
"Charlotte Hornets 15 15 15 15 15 15 13 \n",
"Chicago Bulls 15 15 15 15 15 15 12 \n",
"Cleveland Cavaliers 15 15 15 15 15 15 12 \n",
"Dallas Mavericks 15 15 15 15 15 15 12 \n",
"Denver Nuggets 15 15 15 15 15 15 9 \n",
"Detroit Pistons 15 15 15 15 15 15 15 \n",
"Golden State Warriors 15 15 15 15 15 15 12 \n",
"\n",
" Salary Weight_Class \n",
"Team \n",
"Atlanta Hawks 15 15 \n",
"Boston Celtics 14 15 \n",
"Brooklyn Nets 15 15 \n",
"Charlotte Hornets 15 15 \n",
"Chicago Bulls 15 15 \n",
"Cleveland Cavaliers 14 15 \n",
"Dallas Mavericks 15 15 \n",
"Denver Nuggets 14 15 \n",
"Detroit Pistons 15 15 \n",
"Golden State Warriors 15 15 "
]
},
"execution_count": 105,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"nba_df.groupby('Team').count().head(10)\n",
"# This counts the number of values in each column. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment