-
-
Save jing-jin-mc/eee5fc89ab76cd657940f7c9e8beeda1 to your computer and use it in GitHub Desktop.
Pandas - Pivot.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "## Pivot\n\n### Functions:\n\n* pivot_table\n* stack\n* unstack\n* melt " | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "import pandas as pd ", | |
"execution_count": 1, | |
"outputs": [] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "#### create a test dataset to demo functions" | |
}, | |
{ | |
"metadata": { | |
"slideshow": { | |
"slide_type": "slide" | |
}, | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "df = pd.DataFrame(data = {'date':['week1','week1','week2','week3','week3','week3'],\n 'sales':[50,20,35,10,20,40],\n 'category':['a','b','a','a','b','a'],\n 'revenue':[14,8,16,None,7,9]\n })\ndf.date = df.date.astype('str')\ndf.category = df.category.astype('str')\ndf", | |
"execution_count": 4, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"execution_count": 4, | |
"data": { | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>category</th>\n <th>date</th>\n <th>revenue</th>\n <th>sales</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>a</td>\n <td>week1</td>\n <td>14.0</td>\n <td>50</td>\n </tr>\n <tr>\n <th>1</th>\n <td>b</td>\n <td>week1</td>\n <td>8.0</td>\n <td>20</td>\n </tr>\n <tr>\n <th>2</th>\n <td>a</td>\n <td>week2</td>\n <td>16.0</td>\n <td>35</td>\n </tr>\n <tr>\n <th>3</th>\n <td>a</td>\n <td>week3</td>\n <td>NaN</td>\n <td>10</td>\n </tr>\n <tr>\n <th>4</th>\n <td>b</td>\n <td>week3</td>\n <td>7.0</td>\n <td>20</td>\n </tr>\n <tr>\n <th>5</th>\n <td>a</td>\n <td>week3</td>\n <td>9.0</td>\n <td>40</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
"text/plain": " category date revenue sales\n0 a week1 14.0 50\n1 b week1 8.0 20\n2 a week2 16.0 35\n3 a week3 NaN 10\n4 b week3 7.0 20\n5 a week3 9.0 40" | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "df.info()", | |
"execution_count": 5, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"text": "<class 'pandas.core.frame.DataFrame'>\nRangeIndex: 6 entries, 0 to 5\nData columns (total 4 columns):\ncategory 6 non-null object\ndate 6 non-null object\nrevenue 5 non-null float64\nsales 6 non-null int64\ndtypes: float64(1), int64(1), object(2)\nmemory usage: 272.0+ bytes\n", | |
"name": "stdout" | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "### pandas.pivot_table( data, values, index, columns, aggfunc, fill_value ...)\n\nCreate a spreadsheet-style pivot tables as a DataFrame. The levels in the pivot table will be stored in MultiIndex objects (hierarchical indexes) on the index and columns of result DataFrame\n\n* data: DataFrame \n* values: column to aggregate, optional \n* index: column, Grouper, array, or list of the them \n* columns: column, Grouper, array, or list of the them\n* aggfunc: function, list of functions, dict, default numpy.mean\n* fill_value: scalar, default None. Value to replace missing values with \n\n#### example 1: " | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "pd.pivot_table(df,values='sales',index='date',columns='category',fill_value=0)", | |
"execution_count": 6, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"execution_count": 6, | |
"data": { | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th>category</th>\n <th>a</th>\n <th>b</th>\n </tr>\n <tr>\n <th>date</th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>week1</th>\n <td>50</td>\n <td>20</td>\n </tr>\n <tr>\n <th>week2</th>\n <td>35</td>\n <td>0</td>\n </tr>\n <tr>\n <th>week3</th>\n <td>25</td>\n <td>20</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
"text/plain": "category a b\ndate \nweek1 50 20\nweek2 35 0\nweek3 25 20" | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "#### example for using aggregate function \nimport numpy library as np to use its functions " | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "import numpy as np\npd.pivot_table(df,values=['sales','revenue'],columns='category',aggfunc = np.sum, fill_value=0)", | |
"execution_count": 7, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"execution_count": 7, | |
"data": { | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th>category</th>\n <th>a</th>\n <th>b</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>revenue</th>\n <td>39</td>\n <td>15</td>\n </tr>\n <tr>\n <th>sales</th>\n <td>135</td>\n <td>40</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
"text/plain": "category a b\nrevenue 39 15\nsales 135 40" | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "#### exercise:\nshow the total revenue and sales for each week and each category" | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "pd.pivot_table(df,\n values = ['sales','revenue'],\n index='date',\n columns = 'category',\n aggfunc =np.sum,\n fill_value = 0\n )", | |
"execution_count": 8, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"execution_count": 8, | |
"data": { | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead tr th {\n text-align: left;\n }\n\n .dataframe thead tr:last-of-type th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr>\n <th></th>\n <th colspan=\"2\" halign=\"left\">revenue</th>\n <th colspan=\"2\" halign=\"left\">sales</th>\n </tr>\n <tr>\n <th>category</th>\n <th>a</th>\n <th>b</th>\n <th>a</th>\n <th>b</th>\n </tr>\n <tr>\n <th>date</th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>week1</th>\n <td>14</td>\n <td>8</td>\n <td>50</td>\n <td>20</td>\n </tr>\n <tr>\n <th>week2</th>\n <td>16</td>\n <td>0</td>\n <td>35</td>\n <td>0</td>\n </tr>\n <tr>\n <th>week3</th>\n <td>9</td>\n <td>7</td>\n <td>50</td>\n <td>20</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
"text/plain": " revenue sales \ncategory a b a b\ndate \nweek1 14 8 50 20\nweek2 16 0 35 0\nweek3 9 7 50 20" | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "### DataFrame.stack( level = -1, dropna = True)\n\nStack the prescribed level(s) from columns to index, like unpivot and melt\n\n#### example 1: " | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "stack_example_df = pd.pivot_table(df,values=['sales','revenue'],index='date',columns='category',fill_value=0)\nstack_example_df", | |
"execution_count": 9, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"execution_count": 9, | |
"data": { | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead tr th {\n text-align: left;\n }\n\n .dataframe thead tr:last-of-type th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr>\n <th></th>\n <th colspan=\"2\" halign=\"left\">revenue</th>\n <th colspan=\"2\" halign=\"left\">sales</th>\n </tr>\n <tr>\n <th>category</th>\n <th>a</th>\n <th>b</th>\n <th>a</th>\n <th>b</th>\n </tr>\n <tr>\n <th>date</th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>week1</th>\n <td>14</td>\n <td>8</td>\n <td>50</td>\n <td>20</td>\n </tr>\n <tr>\n <th>week2</th>\n <td>16</td>\n <td>0</td>\n <td>35</td>\n <td>0</td>\n </tr>\n <tr>\n <th>week3</th>\n <td>9</td>\n <td>7</td>\n <td>25</td>\n <td>20</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
"text/plain": " revenue sales \ncategory a b a b\ndate \nweek1 14 8 50 20\nweek2 16 0 35 0\nweek3 9 7 25 20" | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "unstack_example_df = stack_example_df.stack()\nunstack_example_df", | |
"execution_count": 10, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"execution_count": 10, | |
"data": { | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th></th>\n <th>revenue</th>\n <th>sales</th>\n </tr>\n <tr>\n <th>date</th>\n <th>category</th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th rowspan=\"2\" valign=\"top\">week1</th>\n <th>a</th>\n <td>14</td>\n <td>50</td>\n </tr>\n <tr>\n <th>b</th>\n <td>8</td>\n <td>20</td>\n </tr>\n <tr>\n <th rowspan=\"2\" valign=\"top\">week2</th>\n <th>a</th>\n <td>16</td>\n <td>35</td>\n </tr>\n <tr>\n <th>b</th>\n <td>0</td>\n <td>0</td>\n </tr>\n <tr>\n <th rowspan=\"2\" valign=\"top\">week3</th>\n <th>a</th>\n <td>9</td>\n <td>25</td>\n </tr>\n <tr>\n <th>b</th>\n <td>7</td>\n <td>20</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
"text/plain": " revenue sales\ndate category \nweek1 a 14 50\n b 8 20\nweek2 a 16 35\n b 0 0\nweek3 a 9 25\n b 7 20" | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "### DataFrame.unstack( level , fill_value)\n\nPivot a level of the index labels\n\n* level: int, str, or list of these, default -1 (last level). Level(s) of index to unstack, can pass level name.\n* fill_value: int, str or dict. Replace NaN with this value if the unstack produces missing values.\n\n#### example 1: " | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "unstack_example_df.unstack(level=0)", | |
"execution_count": 11, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"execution_count": 11, | |
"data": { | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead tr th {\n text-align: left;\n }\n\n .dataframe thead tr:last-of-type th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr>\n <th></th>\n <th colspan=\"3\" halign=\"left\">revenue</th>\n <th colspan=\"3\" halign=\"left\">sales</th>\n </tr>\n <tr>\n <th>date</th>\n <th>week1</th>\n <th>week2</th>\n <th>week3</th>\n <th>week1</th>\n <th>week2</th>\n <th>week3</th>\n </tr>\n <tr>\n <th>category</th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>a</th>\n <td>14</td>\n <td>16</td>\n <td>9</td>\n <td>50</td>\n <td>35</td>\n <td>25</td>\n </tr>\n <tr>\n <th>b</th>\n <td>8</td>\n <td>0</td>\n <td>7</td>\n <td>20</td>\n <td>0</td>\n <td>20</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
"text/plain": " revenue sales \ndate week1 week2 week3 week1 week2 week3\ncategory \na 14 16 9 50 35 25\nb 8 0 7 20 0 20" | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "unstack_example_df.unstack(level=-1)", | |
"execution_count": 12, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"execution_count": 12, | |
"data": { | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead tr th {\n text-align: left;\n }\n\n .dataframe thead tr:last-of-type th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr>\n <th></th>\n <th colspan=\"2\" halign=\"left\">revenue</th>\n <th colspan=\"2\" halign=\"left\">sales</th>\n </tr>\n <tr>\n <th>category</th>\n <th>a</th>\n <th>b</th>\n <th>a</th>\n <th>b</th>\n </tr>\n <tr>\n <th>date</th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>week1</th>\n <td>14</td>\n <td>8</td>\n <td>50</td>\n <td>20</td>\n </tr>\n <tr>\n <th>week2</th>\n <td>16</td>\n <td>0</td>\n <td>35</td>\n <td>0</td>\n </tr>\n <tr>\n <th>week3</th>\n <td>9</td>\n <td>7</td>\n <td>25</td>\n <td>20</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
"text/plain": " revenue sales \ncategory a b a b\ndate \nweek1 14 8 50 20\nweek2 16 0 35 0\nweek3 9 7 25 20" | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "### DataFrame.melt()\n\nUnpivot a DataFrame from wide to long format, optionally leaving identifiers set.\n\n#### example 1: " | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "stack_example_df", | |
"execution_count": 13, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"execution_count": 13, | |
"data": { | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead tr th {\n text-align: left;\n }\n\n .dataframe thead tr:last-of-type th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr>\n <th></th>\n <th colspan=\"2\" halign=\"left\">revenue</th>\n <th colspan=\"2\" halign=\"left\">sales</th>\n </tr>\n <tr>\n <th>category</th>\n <th>a</th>\n <th>b</th>\n <th>a</th>\n <th>b</th>\n </tr>\n <tr>\n <th>date</th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>week1</th>\n <td>14</td>\n <td>8</td>\n <td>50</td>\n <td>20</td>\n </tr>\n <tr>\n <th>week2</th>\n <td>16</td>\n <td>0</td>\n <td>35</td>\n <td>0</td>\n </tr>\n <tr>\n <th>week3</th>\n <td>9</td>\n <td>7</td>\n <td>25</td>\n <td>20</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
"text/plain": " revenue sales \ncategory a b a b\ndate \nweek1 14 8 50 20\nweek2 16 0 35 0\nweek3 9 7 25 20" | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "stack_example_df.melt()", | |
"execution_count": 14, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"execution_count": 14, | |
"data": { | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>None</th>\n <th>category</th>\n <th>value</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>revenue</td>\n <td>a</td>\n <td>14</td>\n </tr>\n <tr>\n <th>1</th>\n <td>revenue</td>\n <td>a</td>\n <td>16</td>\n </tr>\n <tr>\n <th>2</th>\n <td>revenue</td>\n <td>a</td>\n <td>9</td>\n </tr>\n <tr>\n <th>3</th>\n <td>revenue</td>\n <td>b</td>\n <td>8</td>\n </tr>\n <tr>\n <th>4</th>\n <td>revenue</td>\n <td>b</td>\n <td>0</td>\n </tr>\n <tr>\n <th>5</th>\n <td>revenue</td>\n <td>b</td>\n <td>7</td>\n </tr>\n <tr>\n <th>6</th>\n <td>sales</td>\n <td>a</td>\n <td>50</td>\n </tr>\n <tr>\n <th>7</th>\n <td>sales</td>\n <td>a</td>\n <td>35</td>\n </tr>\n <tr>\n <th>8</th>\n <td>sales</td>\n <td>a</td>\n <td>25</td>\n </tr>\n <tr>\n <th>9</th>\n <td>sales</td>\n <td>b</td>\n <td>20</td>\n </tr>\n <tr>\n <th>10</th>\n <td>sales</td>\n <td>b</td>\n <td>0</td>\n </tr>\n <tr>\n <th>11</th>\n <td>sales</td>\n <td>b</td>\n <td>20</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
"text/plain": " NaN category value\n0 revenue a 14\n1 revenue a 16\n2 revenue a 9\n3 revenue b 8\n4 revenue b 0\n5 revenue b 7\n6 sales a 50\n7 sales a 35\n8 sales a 25\n9 sales b 20\n10 sales b 0\n11 sales b 20" | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "## End\n\nThey are only a means to the end of learning how to do good data analysis " | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"name": "python3", | |
"display_name": "Python 3", | |
"language": "python" | |
}, | |
"gist": { | |
"id": "", | |
"data": { | |
"description": "Pandas - Pivot.ipynb", | |
"public": false | |
} | |
}, | |
"language_info": { | |
"file_extension": ".py", | |
"nbconvert_exporter": "python", | |
"version": "3.5.4", | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"name": "python", | |
"mimetype": "text/x-python", | |
"pygments_lexer": "ipython3" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment