Skip to content

Instantly share code, notes, and snippets.

@jing-jin-mc
Created December 3, 2020 16:10
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jing-jin-mc/eee5fc89ab76cd657940f7c9e8beeda1 to your computer and use it in GitHub Desktop.
Save jing-jin-mc/eee5fc89ab76cd657940f7c9e8beeda1 to your computer and use it in GitHub Desktop.
Pandas - Pivot.ipynb
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"metadata": {},
"cell_type": "markdown",
"source": "## Pivot\n\n### Functions:\n\n* pivot_table\n* stack\n* unstack\n* melt "
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "import pandas as pd ",
"execution_count": 1,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "#### create a test dataset to demo functions"
},
{
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"trusted": true
},
"cell_type": "code",
"source": "df = pd.DataFrame(data = {'date':['week1','week1','week2','week3','week3','week3'],\n 'sales':[50,20,35,10,20,40],\n 'category':['a','b','a','a','b','a'],\n 'revenue':[14,8,16,None,7,9]\n })\ndf.date = df.date.astype('str')\ndf.category = df.category.astype('str')\ndf",
"execution_count": 4,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 4,
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>category</th>\n <th>date</th>\n <th>revenue</th>\n <th>sales</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>a</td>\n <td>week1</td>\n <td>14.0</td>\n <td>50</td>\n </tr>\n <tr>\n <th>1</th>\n <td>b</td>\n <td>week1</td>\n <td>8.0</td>\n <td>20</td>\n </tr>\n <tr>\n <th>2</th>\n <td>a</td>\n <td>week2</td>\n <td>16.0</td>\n <td>35</td>\n </tr>\n <tr>\n <th>3</th>\n <td>a</td>\n <td>week3</td>\n <td>NaN</td>\n <td>10</td>\n </tr>\n <tr>\n <th>4</th>\n <td>b</td>\n <td>week3</td>\n <td>7.0</td>\n <td>20</td>\n </tr>\n <tr>\n <th>5</th>\n <td>a</td>\n <td>week3</td>\n <td>9.0</td>\n <td>40</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " category date revenue sales\n0 a week1 14.0 50\n1 b week1 8.0 20\n2 a week2 16.0 35\n3 a week3 NaN 10\n4 b week3 7.0 20\n5 a week3 9.0 40"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df.info()",
"execution_count": 5,
"outputs": [
{
"output_type": "stream",
"text": "<class 'pandas.core.frame.DataFrame'>\nRangeIndex: 6 entries, 0 to 5\nData columns (total 4 columns):\ncategory 6 non-null object\ndate 6 non-null object\nrevenue 5 non-null float64\nsales 6 non-null int64\ndtypes: float64(1), int64(1), object(2)\nmemory usage: 272.0+ bytes\n",
"name": "stdout"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### pandas.pivot_table( data, values, index, columns, aggfunc, fill_value ...)\n\nCreate a spreadsheet-style pivot tables as a DataFrame. The levels in the pivot table will be stored in MultiIndex objects (hierarchical indexes) on the index and columns of result DataFrame\n\n* data: DataFrame \n* values: column to aggregate, optional \n* index: column, Grouper, array, or list of the them \n* columns: column, Grouper, array, or list of the them\n* aggfunc: function, list of functions, dict, default numpy.mean\n* fill_value: scalar, default None. Value to replace missing values with \n\n#### example 1: "
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "pd.pivot_table(df,values='sales',index='date',columns='category',fill_value=0)",
"execution_count": 6,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 6,
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th>category</th>\n <th>a</th>\n <th>b</th>\n </tr>\n <tr>\n <th>date</th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>week1</th>\n <td>50</td>\n <td>20</td>\n </tr>\n <tr>\n <th>week2</th>\n <td>35</td>\n <td>0</td>\n </tr>\n <tr>\n <th>week3</th>\n <td>25</td>\n <td>20</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": "category a b\ndate \nweek1 50 20\nweek2 35 0\nweek3 25 20"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "#### example for using aggregate function \nimport numpy library as np to use its functions "
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "import numpy as np\npd.pivot_table(df,values=['sales','revenue'],columns='category',aggfunc = np.sum, fill_value=0)",
"execution_count": 7,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 7,
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th>category</th>\n <th>a</th>\n <th>b</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>revenue</th>\n <td>39</td>\n <td>15</td>\n </tr>\n <tr>\n <th>sales</th>\n <td>135</td>\n <td>40</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": "category a b\nrevenue 39 15\nsales 135 40"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "#### exercise:\nshow the total revenue and sales for each week and each category"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "pd.pivot_table(df,\n values = ['sales','revenue'],\n index='date',\n columns = 'category',\n aggfunc =np.sum,\n fill_value = 0\n )",
"execution_count": 8,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 8,
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead tr th {\n text-align: left;\n }\n\n .dataframe thead tr:last-of-type th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr>\n <th></th>\n <th colspan=\"2\" halign=\"left\">revenue</th>\n <th colspan=\"2\" halign=\"left\">sales</th>\n </tr>\n <tr>\n <th>category</th>\n <th>a</th>\n <th>b</th>\n <th>a</th>\n <th>b</th>\n </tr>\n <tr>\n <th>date</th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>week1</th>\n <td>14</td>\n <td>8</td>\n <td>50</td>\n <td>20</td>\n </tr>\n <tr>\n <th>week2</th>\n <td>16</td>\n <td>0</td>\n <td>35</td>\n <td>0</td>\n </tr>\n <tr>\n <th>week3</th>\n <td>9</td>\n <td>7</td>\n <td>50</td>\n <td>20</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " revenue sales \ncategory a b a b\ndate \nweek1 14 8 50 20\nweek2 16 0 35 0\nweek3 9 7 50 20"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### DataFrame.stack( level = -1, dropna = True)\n\nStack the prescribed level(s) from columns to index, like unpivot and melt\n\n#### example 1: "
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "stack_example_df = pd.pivot_table(df,values=['sales','revenue'],index='date',columns='category',fill_value=0)\nstack_example_df",
"execution_count": 9,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 9,
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead tr th {\n text-align: left;\n }\n\n .dataframe thead tr:last-of-type th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr>\n <th></th>\n <th colspan=\"2\" halign=\"left\">revenue</th>\n <th colspan=\"2\" halign=\"left\">sales</th>\n </tr>\n <tr>\n <th>category</th>\n <th>a</th>\n <th>b</th>\n <th>a</th>\n <th>b</th>\n </tr>\n <tr>\n <th>date</th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>week1</th>\n <td>14</td>\n <td>8</td>\n <td>50</td>\n <td>20</td>\n </tr>\n <tr>\n <th>week2</th>\n <td>16</td>\n <td>0</td>\n <td>35</td>\n <td>0</td>\n </tr>\n <tr>\n <th>week3</th>\n <td>9</td>\n <td>7</td>\n <td>25</td>\n <td>20</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " revenue sales \ncategory a b a b\ndate \nweek1 14 8 50 20\nweek2 16 0 35 0\nweek3 9 7 25 20"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "unstack_example_df = stack_example_df.stack()\nunstack_example_df",
"execution_count": 10,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 10,
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th></th>\n <th>revenue</th>\n <th>sales</th>\n </tr>\n <tr>\n <th>date</th>\n <th>category</th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th rowspan=\"2\" valign=\"top\">week1</th>\n <th>a</th>\n <td>14</td>\n <td>50</td>\n </tr>\n <tr>\n <th>b</th>\n <td>8</td>\n <td>20</td>\n </tr>\n <tr>\n <th rowspan=\"2\" valign=\"top\">week2</th>\n <th>a</th>\n <td>16</td>\n <td>35</td>\n </tr>\n <tr>\n <th>b</th>\n <td>0</td>\n <td>0</td>\n </tr>\n <tr>\n <th rowspan=\"2\" valign=\"top\">week3</th>\n <th>a</th>\n <td>9</td>\n <td>25</td>\n </tr>\n <tr>\n <th>b</th>\n <td>7</td>\n <td>20</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " revenue sales\ndate category \nweek1 a 14 50\n b 8 20\nweek2 a 16 35\n b 0 0\nweek3 a 9 25\n b 7 20"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### DataFrame.unstack( level , fill_value)\n\nPivot a level of the index labels\n\n* level: int, str, or list of these, default -1 (last level). Level(s) of index to unstack, can pass level name.\n* fill_value: int, str or dict. Replace NaN with this value if the unstack produces missing values.\n\n#### example 1: "
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "unstack_example_df.unstack(level=0)",
"execution_count": 11,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 11,
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead tr th {\n text-align: left;\n }\n\n .dataframe thead tr:last-of-type th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr>\n <th></th>\n <th colspan=\"3\" halign=\"left\">revenue</th>\n <th colspan=\"3\" halign=\"left\">sales</th>\n </tr>\n <tr>\n <th>date</th>\n <th>week1</th>\n <th>week2</th>\n <th>week3</th>\n <th>week1</th>\n <th>week2</th>\n <th>week3</th>\n </tr>\n <tr>\n <th>category</th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>a</th>\n <td>14</td>\n <td>16</td>\n <td>9</td>\n <td>50</td>\n <td>35</td>\n <td>25</td>\n </tr>\n <tr>\n <th>b</th>\n <td>8</td>\n <td>0</td>\n <td>7</td>\n <td>20</td>\n <td>0</td>\n <td>20</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " revenue sales \ndate week1 week2 week3 week1 week2 week3\ncategory \na 14 16 9 50 35 25\nb 8 0 7 20 0 20"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "unstack_example_df.unstack(level=-1)",
"execution_count": 12,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 12,
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead tr th {\n text-align: left;\n }\n\n .dataframe thead tr:last-of-type th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr>\n <th></th>\n <th colspan=\"2\" halign=\"left\">revenue</th>\n <th colspan=\"2\" halign=\"left\">sales</th>\n </tr>\n <tr>\n <th>category</th>\n <th>a</th>\n <th>b</th>\n <th>a</th>\n <th>b</th>\n </tr>\n <tr>\n <th>date</th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>week1</th>\n <td>14</td>\n <td>8</td>\n <td>50</td>\n <td>20</td>\n </tr>\n <tr>\n <th>week2</th>\n <td>16</td>\n <td>0</td>\n <td>35</td>\n <td>0</td>\n </tr>\n <tr>\n <th>week3</th>\n <td>9</td>\n <td>7</td>\n <td>25</td>\n <td>20</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " revenue sales \ncategory a b a b\ndate \nweek1 14 8 50 20\nweek2 16 0 35 0\nweek3 9 7 25 20"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### DataFrame.melt()\n\nUnpivot a DataFrame from wide to long format, optionally leaving identifiers set.\n\n#### example 1: "
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "stack_example_df",
"execution_count": 13,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 13,
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead tr th {\n text-align: left;\n }\n\n .dataframe thead tr:last-of-type th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr>\n <th></th>\n <th colspan=\"2\" halign=\"left\">revenue</th>\n <th colspan=\"2\" halign=\"left\">sales</th>\n </tr>\n <tr>\n <th>category</th>\n <th>a</th>\n <th>b</th>\n <th>a</th>\n <th>b</th>\n </tr>\n <tr>\n <th>date</th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>week1</th>\n <td>14</td>\n <td>8</td>\n <td>50</td>\n <td>20</td>\n </tr>\n <tr>\n <th>week2</th>\n <td>16</td>\n <td>0</td>\n <td>35</td>\n <td>0</td>\n </tr>\n <tr>\n <th>week3</th>\n <td>9</td>\n <td>7</td>\n <td>25</td>\n <td>20</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " revenue sales \ncategory a b a b\ndate \nweek1 14 8 50 20\nweek2 16 0 35 0\nweek3 9 7 25 20"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "stack_example_df.melt()",
"execution_count": 14,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 14,
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>None</th>\n <th>category</th>\n <th>value</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>revenue</td>\n <td>a</td>\n <td>14</td>\n </tr>\n <tr>\n <th>1</th>\n <td>revenue</td>\n <td>a</td>\n <td>16</td>\n </tr>\n <tr>\n <th>2</th>\n <td>revenue</td>\n <td>a</td>\n <td>9</td>\n </tr>\n <tr>\n <th>3</th>\n <td>revenue</td>\n <td>b</td>\n <td>8</td>\n </tr>\n <tr>\n <th>4</th>\n <td>revenue</td>\n <td>b</td>\n <td>0</td>\n </tr>\n <tr>\n <th>5</th>\n <td>revenue</td>\n <td>b</td>\n <td>7</td>\n </tr>\n <tr>\n <th>6</th>\n <td>sales</td>\n <td>a</td>\n <td>50</td>\n </tr>\n <tr>\n <th>7</th>\n <td>sales</td>\n <td>a</td>\n <td>35</td>\n </tr>\n <tr>\n <th>8</th>\n <td>sales</td>\n <td>a</td>\n <td>25</td>\n </tr>\n <tr>\n <th>9</th>\n <td>sales</td>\n <td>b</td>\n <td>20</td>\n </tr>\n <tr>\n <th>10</th>\n <td>sales</td>\n <td>b</td>\n <td>0</td>\n </tr>\n <tr>\n <th>11</th>\n <td>sales</td>\n <td>b</td>\n <td>20</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " NaN category value\n0 revenue a 14\n1 revenue a 16\n2 revenue a 9\n3 revenue b 8\n4 revenue b 0\n5 revenue b 7\n6 sales a 50\n7 sales a 35\n8 sales a 25\n9 sales b 20\n10 sales b 0\n11 sales b 20"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "## End\n\nThey are only a means to the end of learning how to do good data analysis "
}
],
"metadata": {
"kernelspec": {
"name": "python3",
"display_name": "Python 3",
"language": "python"
},
"gist": {
"id": "",
"data": {
"description": "Pandas - Pivot.ipynb",
"public": false
}
},
"language_info": {
"file_extension": ".py",
"nbconvert_exporter": "python",
"version": "3.5.4",
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"name": "python",
"mimetype": "text/x-python",
"pygments_lexer": "ipython3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment