Skip to content

Instantly share code, notes, and snippets.

@jing-jin-mc
Created December 2, 2020 21:46
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jing-jin-mc/46baed62f978a48f46b878bfc1cdd292 to your computer and use it in GitHub Desktop.
Save jing-jin-mc/46baed62f978a48f46b878bfc1cdd292 to your computer and use it in GitHub Desktop.
Pandas - Merge and reshaping
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"metadata": {},
"cell_type": "markdown",
"source": "## Merge and reshaping\n\n### Functions:\n* merge \n* join\n* drop\n* sort\n* concat\n* append\n\nCreate another dataset to merge with "
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "import pandas as pd ",
"execution_count": 1,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "#### create a test dataset to demo functions"
},
{
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"trusted": true
},
"cell_type": "code",
"source": "df = pd.DataFrame(data = {'date':['week1','week1','week2','week3','week3','week3'],\n 'sales':[50,20,35,10,20,40],\n 'category':['a','b','a','a','b','a'],\n 'revenue':[14,8,16,None,7,9]\n })\ndf.date = df.date.astype('str')\ndf.category = df.category.astype('str')\ndf",
"execution_count": 3,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 3,
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>category</th>\n <th>date</th>\n <th>revenue</th>\n <th>sales</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>a</td>\n <td>week1</td>\n <td>14.0</td>\n <td>50</td>\n </tr>\n <tr>\n <th>1</th>\n <td>b</td>\n <td>week1</td>\n <td>8.0</td>\n <td>20</td>\n </tr>\n <tr>\n <th>2</th>\n <td>a</td>\n <td>week2</td>\n <td>16.0</td>\n <td>35</td>\n </tr>\n <tr>\n <th>3</th>\n <td>a</td>\n <td>week3</td>\n <td>NaN</td>\n <td>10</td>\n </tr>\n <tr>\n <th>4</th>\n <td>b</td>\n <td>week3</td>\n <td>7.0</td>\n <td>20</td>\n </tr>\n <tr>\n <th>5</th>\n <td>a</td>\n <td>week3</td>\n <td>9.0</td>\n <td>40</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " category date revenue sales\n0 a week1 14.0 50\n1 b week1 8.0 20\n2 a week2 16.0 35\n3 a week3 NaN 10\n4 b week3 7.0 20\n5 a week3 9.0 40"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df.info()",
"execution_count": 4,
"outputs": [
{
"output_type": "stream",
"text": "<class 'pandas.core.frame.DataFrame'>\nRangeIndex: 6 entries, 0 to 5\nData columns (total 4 columns):\ncategory 6 non-null object\ndate 6 non-null object\nrevenue 5 non-null float64\nsales 6 non-null int64\ndtypes: float64(1), int64(1), object(2)\nmemory usage: 272.0+ bytes\n",
"name": "stdout"
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df1 = pd.DataFrame(data = {'Date':['week1','week2','week3','week4'],\n 'category':['a','b','c','d'],\n 'cost':[1,2,3,4]\n })\ndf1.Date = df1['Date'].astype('str')\ndf1.category = df1.category.astype('str')\ndf1",
"execution_count": 6,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 6,
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Date</th>\n <th>category</th>\n <th>cost</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>week1</td>\n <td>a</td>\n <td>1</td>\n </tr>\n <tr>\n <th>1</th>\n <td>week2</td>\n <td>b</td>\n <td>2</td>\n </tr>\n <tr>\n <th>2</th>\n <td>week3</td>\n <td>c</td>\n <td>3</td>\n </tr>\n <tr>\n <th>3</th>\n <td>week4</td>\n <td>d</td>\n <td>4</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " Date category cost\n0 week1 a 1\n1 week2 b 2\n2 week3 c 3\n3 week4 d 4"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df1.info()",
"execution_count": 7,
"outputs": [
{
"output_type": "stream",
"text": "<class 'pandas.core.frame.DataFrame'>\nRangeIndex: 4 entries, 0 to 3\nData columns (total 3 columns):\nDate 4 non-null object\ncategory 4 non-null object\ncost 4 non-null int64\ndtypes: int64(1), object(2)\nmemory usage: 176.0+ bytes\n",
"name": "stdout"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### DataFrame.merge(df, how, on, left_on, right_on, suffixes, ...)\n\nParameters:\n\n* how:{‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘inner’\n\n* on: label or list. Column or index level names used as key to join on. \n\n* left_on and right_on: label or list, or array-like\n\n* suffixes: list-like, default is (“_x”, “_y”). A string indicating the suffix to add to overlapping column names in left and right respectively. \n\n#### example 1: "
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df.merge(df1,how = 'left', on= 'category')",
"execution_count": 8,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 8,
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>category</th>\n <th>date</th>\n <th>revenue</th>\n <th>sales</th>\n <th>Date</th>\n <th>cost</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>a</td>\n <td>week1</td>\n <td>14.0</td>\n <td>50</td>\n <td>week1</td>\n <td>1</td>\n </tr>\n <tr>\n <th>1</th>\n <td>b</td>\n <td>week1</td>\n <td>8.0</td>\n <td>20</td>\n <td>week2</td>\n <td>2</td>\n </tr>\n <tr>\n <th>2</th>\n <td>a</td>\n <td>week2</td>\n <td>16.0</td>\n <td>35</td>\n <td>week1</td>\n <td>1</td>\n </tr>\n <tr>\n <th>3</th>\n <td>a</td>\n <td>week3</td>\n <td>NaN</td>\n <td>10</td>\n <td>week1</td>\n <td>1</td>\n </tr>\n <tr>\n <th>4</th>\n <td>b</td>\n <td>week3</td>\n <td>7.0</td>\n <td>20</td>\n <td>week2</td>\n <td>2</td>\n </tr>\n <tr>\n <th>5</th>\n <td>a</td>\n <td>week3</td>\n <td>9.0</td>\n <td>40</td>\n <td>week1</td>\n <td>1</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " category date revenue sales Date cost\n0 a week1 14.0 50 week1 1\n1 b week1 8.0 20 week2 2\n2 a week2 16.0 35 week1 1\n3 a week3 NaN 10 week1 1\n4 b week3 7.0 20 week2 2\n5 a week3 9.0 40 week1 1"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "#### example 2: \nwhen the merge key name is different for two datasets, using left_on and right_on"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df.merge(df1,how = 'inner', left_on= 'date',right_on='Date', suffixes = ('_caller', '_other'))",
"execution_count": 9,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 9,
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>category_caller</th>\n <th>date</th>\n <th>revenue</th>\n <th>sales</th>\n <th>Date</th>\n <th>category_other</th>\n <th>cost</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>a</td>\n <td>week1</td>\n <td>14.0</td>\n <td>50</td>\n <td>week1</td>\n <td>a</td>\n <td>1</td>\n </tr>\n <tr>\n <th>1</th>\n <td>b</td>\n <td>week1</td>\n <td>8.0</td>\n <td>20</td>\n <td>week1</td>\n <td>a</td>\n <td>1</td>\n </tr>\n <tr>\n <th>2</th>\n <td>a</td>\n <td>week2</td>\n <td>16.0</td>\n <td>35</td>\n <td>week2</td>\n <td>b</td>\n <td>2</td>\n </tr>\n <tr>\n <th>3</th>\n <td>a</td>\n <td>week3</td>\n <td>NaN</td>\n <td>10</td>\n <td>week3</td>\n <td>c</td>\n <td>3</td>\n </tr>\n <tr>\n <th>4</th>\n <td>b</td>\n <td>week3</td>\n <td>7.0</td>\n <td>20</td>\n <td>week3</td>\n <td>c</td>\n <td>3</td>\n </tr>\n <tr>\n <th>5</th>\n <td>a</td>\n <td>week3</td>\n <td>9.0</td>\n <td>40</td>\n <td>week3</td>\n <td>c</td>\n <td>3</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " category_caller date revenue sales Date category_other cost\n0 a week1 14.0 50 week1 a 1\n1 b week1 8.0 20 week1 a 1\n2 a week2 16.0 35 week2 b 2\n3 a week3 NaN 10 week3 c 3\n4 b week3 7.0 20 week3 c 3\n5 a week3 9.0 40 week3 c 3"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "#### excercise: \nMerge on date and category and keep all the entries in both dataset "
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df.merge(df1,\n how = 'outer',\n left_on=['date','category'],\n right_on =['Date','category']\n )",
"execution_count": 10,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 10,
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>category</th>\n <th>date</th>\n <th>revenue</th>\n <th>sales</th>\n <th>Date</th>\n <th>cost</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>a</td>\n <td>week1</td>\n <td>14.0</td>\n <td>50.0</td>\n <td>week1</td>\n <td>1.0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>b</td>\n <td>week1</td>\n <td>8.0</td>\n <td>20.0</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>2</th>\n <td>a</td>\n <td>week2</td>\n <td>16.0</td>\n <td>35.0</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>3</th>\n <td>a</td>\n <td>week3</td>\n <td>NaN</td>\n <td>10.0</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>4</th>\n <td>a</td>\n <td>week3</td>\n <td>9.0</td>\n <td>40.0</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>5</th>\n <td>b</td>\n <td>week3</td>\n <td>7.0</td>\n <td>20.0</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>6</th>\n <td>b</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>week2</td>\n <td>2.0</td>\n </tr>\n <tr>\n <th>7</th>\n <td>c</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>week3</td>\n <td>3.0</td>\n </tr>\n <tr>\n <th>8</th>\n <td>d</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>week4</td>\n <td>4.0</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " category date revenue sales Date cost\n0 a week1 14.0 50.0 week1 1.0\n1 b week1 8.0 20.0 NaN NaN\n2 a week2 16.0 35.0 NaN NaN\n3 a week3 NaN 10.0 NaN NaN\n4 a week3 9.0 40.0 NaN NaN\n5 b week3 7.0 20.0 NaN NaN\n6 b NaN NaN NaN week2 2.0\n7 c NaN NaN NaN week3 3.0\n8 d NaN NaN NaN week4 4.0"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### DataFrame.join( other, on, how, lsuffix, rsuffix, sort)\n\nJoin columns with other DataFrame either on index or on a key column. Efficiently join multiple DataFrame objects by index at once by passing a list.\n\nParameters:\n* other: DataFrame, Series, or list of DataFrame\n* on: str, list of str, or array-like, optional\n* how: {‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘left’\n* lsuffix: str, default ‘’. Suffix to use from left frame’s overlapping columns.\n* rsuffix: str, default ‘’. Suffix to use from right frame’s overlapping columns.\n* sort: bool, default False\n\n#### example 1: "
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df.join(df1,lsuffix='_caller', rsuffix='_other')",
"execution_count": 11,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 11,
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>category_caller</th>\n <th>date</th>\n <th>revenue</th>\n <th>sales</th>\n <th>Date</th>\n <th>category_other</th>\n <th>cost</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>a</td>\n <td>week1</td>\n <td>14.0</td>\n <td>50</td>\n <td>week1</td>\n <td>a</td>\n <td>1.0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>b</td>\n <td>week1</td>\n <td>8.0</td>\n <td>20</td>\n <td>week2</td>\n <td>b</td>\n <td>2.0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>a</td>\n <td>week2</td>\n <td>16.0</td>\n <td>35</td>\n <td>week3</td>\n <td>c</td>\n <td>3.0</td>\n </tr>\n <tr>\n <th>3</th>\n <td>a</td>\n <td>week3</td>\n <td>NaN</td>\n <td>10</td>\n <td>week4</td>\n <td>d</td>\n <td>4.0</td>\n </tr>\n <tr>\n <th>4</th>\n <td>b</td>\n <td>week3</td>\n <td>7.0</td>\n <td>20</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>5</th>\n <td>a</td>\n <td>week3</td>\n <td>9.0</td>\n <td>40</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " category_caller date revenue sales Date category_other cost\n0 a week1 14.0 50 week1 a 1.0\n1 b week1 8.0 20 week2 b 2.0\n2 a week2 16.0 35 week3 c 3.0\n3 a week3 NaN 10 week4 d 4.0\n4 b week3 7.0 20 NaN NaN NaN\n5 a week3 9.0 40 NaN NaN NaN"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "#### example 2: \nspecify the column to join"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df2 = df1.rename(columns={'Date':'date'})\ndf2",
"execution_count": 12,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 12,
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>category</th>\n <th>cost</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>week1</td>\n <td>a</td>\n <td>1</td>\n </tr>\n <tr>\n <th>1</th>\n <td>week2</td>\n <td>b</td>\n <td>2</td>\n </tr>\n <tr>\n <th>2</th>\n <td>week3</td>\n <td>c</td>\n <td>3</td>\n </tr>\n <tr>\n <th>3</th>\n <td>week4</td>\n <td>d</td>\n <td>4</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " date category cost\n0 week1 a 1\n1 week2 b 2\n2 week3 c 3\n3 week4 d 4"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df.info()",
"execution_count": 13,
"outputs": [
{
"output_type": "stream",
"text": "<class 'pandas.core.frame.DataFrame'>\nRangeIndex: 6 entries, 0 to 5\nData columns (total 4 columns):\ncategory 6 non-null object\ndate 6 non-null object\nrevenue 5 non-null float64\nsales 6 non-null int64\ndtypes: float64(1), int64(1), object(2)\nmemory usage: 272.0+ bytes\n",
"name": "stdout"
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df2.info()",
"execution_count": 14,
"outputs": [
{
"output_type": "stream",
"text": "<class 'pandas.core.frame.DataFrame'>\nRangeIndex: 4 entries, 0 to 3\nData columns (total 3 columns):\ndate 4 non-null object\ncategory 4 non-null object\ncost 4 non-null int64\ndtypes: int64(1), object(2)\nmemory usage: 176.0+ bytes\n",
"name": "stdout"
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df2_eg = df2.set_index('category')\ndf2_eg",
"execution_count": 15,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 15,
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>cost</th>\n </tr>\n <tr>\n <th>category</th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>a</th>\n <td>week1</td>\n <td>1</td>\n </tr>\n <tr>\n <th>b</th>\n <td>week2</td>\n <td>2</td>\n </tr>\n <tr>\n <th>c</th>\n <td>week3</td>\n <td>3</td>\n </tr>\n <tr>\n <th>d</th>\n <td>week4</td>\n <td>4</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " date cost\ncategory \na week1 1\nb week2 2\nc week3 3\nd week4 4"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df2",
"execution_count": 16,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 16,
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>date</th>\n <th>category</th>\n <th>cost</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>week1</td>\n <td>a</td>\n <td>1</td>\n </tr>\n <tr>\n <th>1</th>\n <td>week2</td>\n <td>b</td>\n <td>2</td>\n </tr>\n <tr>\n <th>2</th>\n <td>week3</td>\n <td>c</td>\n <td>3</td>\n </tr>\n <tr>\n <th>3</th>\n <td>week4</td>\n <td>d</td>\n <td>4</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " date category cost\n0 week1 a 1\n1 week2 b 2\n2 week3 c 3\n3 week4 d 4"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df.join(df2_eg, on='category', how = 'outer',lsuffix='_caller', rsuffix='_other')",
"execution_count": 17,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 17,
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>category</th>\n <th>date_caller</th>\n <th>revenue</th>\n <th>sales</th>\n <th>date_other</th>\n <th>cost</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>a</td>\n <td>week1</td>\n <td>14.0</td>\n <td>50.0</td>\n <td>week1</td>\n <td>1</td>\n </tr>\n <tr>\n <th>2</th>\n <td>a</td>\n <td>week2</td>\n <td>16.0</td>\n <td>35.0</td>\n <td>week1</td>\n <td>1</td>\n </tr>\n <tr>\n <th>3</th>\n <td>a</td>\n <td>week3</td>\n <td>NaN</td>\n <td>10.0</td>\n <td>week1</td>\n <td>1</td>\n </tr>\n <tr>\n <th>5</th>\n <td>a</td>\n <td>week3</td>\n <td>9.0</td>\n <td>40.0</td>\n <td>week1</td>\n <td>1</td>\n </tr>\n <tr>\n <th>1</th>\n <td>b</td>\n <td>week1</td>\n <td>8.0</td>\n <td>20.0</td>\n <td>week2</td>\n <td>2</td>\n </tr>\n <tr>\n <th>4</th>\n <td>b</td>\n <td>week3</td>\n <td>7.0</td>\n <td>20.0</td>\n <td>week2</td>\n <td>2</td>\n </tr>\n <tr>\n <th>5</th>\n <td>c</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>week3</td>\n <td>3</td>\n </tr>\n <tr>\n <th>5</th>\n <td>d</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>week4</td>\n <td>4</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " category date_caller revenue sales date_other cost\n0 a week1 14.0 50.0 week1 1\n2 a week2 16.0 35.0 week1 1\n3 a week3 NaN 10.0 week1 1\n5 a week3 9.0 40.0 week1 1\n1 b week1 8.0 20.0 week2 2\n4 b week3 7.0 20.0 week2 2\n5 c NaN NaN NaN week3 3\n5 d NaN NaN NaN week4 4"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "#### Difference between join and merge\nThey are quite equivalent. But to use them more conveninet: \n\nThink of merge as aligning on columns. By default merge will look for overlapping columns in which to merge on. merge gives better control over merge keys by allowing the user to specify a subset of the overlapping columns to use with parameter on, or to separately allow the specification of which columns on the left and which columns on the right to merge by.\n\nThink of join as wanting to combine to dataframes based on their respective indexes. If there are overlapping columns, join will want you to add a suffix to the overlapping column name from left dataframe."
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### DataFrame.drop( labels, axis )\n\nRemove rows or columns by specifying label names and corresponding axis, or by specifying directly index or column names. \n\n* lables: signle label or a list of labels\n\n* axis: {0 or 'index', 1 or 'columns'} default 0\n\n#### example 1: "
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df.drop(['sales','revenue'],axis=1)",
"execution_count": 18,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 18,
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>category</th>\n <th>date</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>a</td>\n <td>week1</td>\n </tr>\n <tr>\n <th>1</th>\n <td>b</td>\n <td>week1</td>\n </tr>\n <tr>\n <th>2</th>\n <td>a</td>\n <td>week2</td>\n </tr>\n <tr>\n <th>3</th>\n <td>a</td>\n <td>week3</td>\n </tr>\n <tr>\n <th>4</th>\n <td>b</td>\n <td>week3</td>\n </tr>\n <tr>\n <th>5</th>\n <td>a</td>\n <td>week3</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " category date\n0 a week1\n1 b week1\n2 a week2\n3 a week3\n4 b week3\n5 a week3"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "#### example 2: "
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df.drop([0,1],axis=0)",
"execution_count": 19,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 19,
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>category</th>\n <th>date</th>\n <th>revenue</th>\n <th>sales</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>2</th>\n <td>a</td>\n <td>week2</td>\n <td>16.0</td>\n <td>35</td>\n </tr>\n <tr>\n <th>3</th>\n <td>a</td>\n <td>week3</td>\n <td>NaN</td>\n <td>10</td>\n </tr>\n <tr>\n <th>4</th>\n <td>b</td>\n <td>week3</td>\n <td>7.0</td>\n <td>20</td>\n </tr>\n <tr>\n <th>5</th>\n <td>a</td>\n <td>week3</td>\n <td>9.0</td>\n <td>40</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " category date revenue sales\n2 a week2 16.0 35\n3 a week3 NaN 10\n4 b week3 7.0 20\n5 a week3 9.0 40"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "#### example for .dropna( axis, how)\n* axis: {0 or ‘index’, 1 or ‘columns’}, default 0, used to indicate what to drop, columns or rows \n* how : {'any','all'}, default 'any'"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df.dropna(axis=1, how = 'all')",
"execution_count": 20,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 20,
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>category</th>\n <th>date</th>\n <th>revenue</th>\n <th>sales</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>a</td>\n <td>week1</td>\n <td>14.0</td>\n <td>50</td>\n </tr>\n <tr>\n <th>1</th>\n <td>b</td>\n <td>week1</td>\n <td>8.0</td>\n <td>20</td>\n </tr>\n <tr>\n <th>2</th>\n <td>a</td>\n <td>week2</td>\n <td>16.0</td>\n <td>35</td>\n </tr>\n <tr>\n <th>3</th>\n <td>a</td>\n <td>week3</td>\n <td>NaN</td>\n <td>10</td>\n </tr>\n <tr>\n <th>4</th>\n <td>b</td>\n <td>week3</td>\n <td>7.0</td>\n <td>20</td>\n </tr>\n <tr>\n <th>5</th>\n <td>a</td>\n <td>week3</td>\n <td>9.0</td>\n <td>40</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " category date revenue sales\n0 a week1 14.0 50\n1 b week1 8.0 20\n2 a week2 16.0 35\n3 a week3 NaN 10\n4 b week3 7.0 20\n5 a week3 9.0 40"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df.dropna(axis=0, how = 'any')",
"execution_count": 21,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 21,
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>category</th>\n <th>date</th>\n <th>revenue</th>\n <th>sales</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>a</td>\n <td>week1</td>\n <td>14.0</td>\n <td>50</td>\n </tr>\n <tr>\n <th>1</th>\n <td>b</td>\n <td>week1</td>\n <td>8.0</td>\n <td>20</td>\n </tr>\n <tr>\n <th>2</th>\n <td>a</td>\n <td>week2</td>\n <td>16.0</td>\n <td>35</td>\n </tr>\n <tr>\n <th>4</th>\n <td>b</td>\n <td>week3</td>\n <td>7.0</td>\n <td>20</td>\n </tr>\n <tr>\n <th>5</th>\n <td>a</td>\n <td>week3</td>\n <td>9.0</td>\n <td>40</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " category date revenue sales\n0 a week1 14.0 50\n1 b week1 8.0 20\n2 a week2 16.0 35\n4 b week3 7.0 20\n5 a week3 9.0 40"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### DataFrame.sort_values( by, axis, ascending)\n\nSort by the values along either axis\n\n* by: str or list of str of lables \n* axis: {0 or ‘index’, 1 or ‘columns’}, default 0\n* ascending: bool or list of bool, default True\n\n#### example : "
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df.sort_values(['sales','revenue'],ascending=False)",
"execution_count": 22,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 22,
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>category</th>\n <th>date</th>\n <th>revenue</th>\n <th>sales</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>a</td>\n <td>week1</td>\n <td>14.0</td>\n <td>50</td>\n </tr>\n <tr>\n <th>5</th>\n <td>a</td>\n <td>week3</td>\n <td>9.0</td>\n <td>40</td>\n </tr>\n <tr>\n <th>2</th>\n <td>a</td>\n <td>week2</td>\n <td>16.0</td>\n <td>35</td>\n </tr>\n <tr>\n <th>1</th>\n <td>b</td>\n <td>week1</td>\n <td>8.0</td>\n <td>20</td>\n </tr>\n <tr>\n <th>4</th>\n <td>b</td>\n <td>week3</td>\n <td>7.0</td>\n <td>20</td>\n </tr>\n <tr>\n <th>3</th>\n <td>a</td>\n <td>week3</td>\n <td>NaN</td>\n <td>10</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " category date revenue sales\n0 a week1 14.0 50\n5 a week3 9.0 40\n2 a week2 16.0 35\n1 b week1 8.0 20\n4 b week3 7.0 20\n3 a week3 NaN 10"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df.sort_values('sales',ascending=False)",
"execution_count": 23,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 23,
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>category</th>\n <th>date</th>\n <th>revenue</th>\n <th>sales</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>a</td>\n <td>week1</td>\n <td>14.0</td>\n <td>50</td>\n </tr>\n <tr>\n <th>5</th>\n <td>a</td>\n <td>week3</td>\n <td>9.0</td>\n <td>40</td>\n </tr>\n <tr>\n <th>2</th>\n <td>a</td>\n <td>week2</td>\n <td>16.0</td>\n <td>35</td>\n </tr>\n <tr>\n <th>1</th>\n <td>b</td>\n <td>week1</td>\n <td>8.0</td>\n <td>20</td>\n </tr>\n <tr>\n <th>4</th>\n <td>b</td>\n <td>week3</td>\n <td>7.0</td>\n <td>20</td>\n </tr>\n <tr>\n <th>3</th>\n <td>a</td>\n <td>week3</td>\n <td>NaN</td>\n <td>10</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " category date revenue sales\n0 a week1 14.0 50\n5 a week3 9.0 40\n2 a week2 16.0 35\n1 b week1 8.0 20\n4 b week3 7.0 20\n3 a week3 NaN 10"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### Pandas.concat ( objects, axis, join, ignore_index, ...)\n\nConcatenate pandas objects along a particular axis with optinal set logic along the other axes.\n\n* objects: a sequence or mapping of Series or dataframes \n* axis: {0: ’index’, 1: ’columns’}, default 0, indicates the axis to concatenate along\n* join: {‘inner’, ‘outer’}, default ‘outer’, indicates how to handle indexes on other axis\n* ignore_index: bool, default False. Set as true when the index is meaningful.\n\n#### example for join = 'outer' and ignore_index = False : "
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "pd.concat([df,df1])",
"execution_count": 25,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 25,
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Date</th>\n <th>category</th>\n <th>cost</th>\n <th>date</th>\n <th>revenue</th>\n <th>sales</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>NaN</td>\n <td>a</td>\n <td>NaN</td>\n <td>week1</td>\n <td>14.0</td>\n <td>50.0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>NaN</td>\n <td>b</td>\n <td>NaN</td>\n <td>week1</td>\n <td>8.0</td>\n <td>20.0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>NaN</td>\n <td>a</td>\n <td>NaN</td>\n <td>week2</td>\n <td>16.0</td>\n <td>35.0</td>\n </tr>\n <tr>\n <th>3</th>\n <td>NaN</td>\n <td>a</td>\n <td>NaN</td>\n <td>week3</td>\n <td>NaN</td>\n <td>10.0</td>\n </tr>\n <tr>\n <th>4</th>\n <td>NaN</td>\n <td>b</td>\n <td>NaN</td>\n <td>week3</td>\n <td>7.0</td>\n <td>20.0</td>\n </tr>\n <tr>\n <th>5</th>\n <td>NaN</td>\n <td>a</td>\n <td>NaN</td>\n <td>week3</td>\n <td>9.0</td>\n <td>40.0</td>\n </tr>\n <tr>\n <th>0</th>\n <td>week1</td>\n <td>a</td>\n <td>1.0</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>1</th>\n <td>week2</td>\n <td>b</td>\n <td>2.0</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>2</th>\n <td>week3</td>\n <td>c</td>\n <td>3.0</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>3</th>\n <td>week4</td>\n <td>d</td>\n <td>4.0</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " Date category cost date revenue sales\n0 NaN a NaN week1 14.0 50.0\n1 NaN b NaN week1 8.0 20.0\n2 NaN a NaN week2 16.0 35.0\n3 NaN a NaN week3 NaN 10.0\n4 NaN b NaN week3 7.0 20.0\n5 NaN a NaN week3 9.0 40.0\n0 week1 a 1.0 NaN NaN NaN\n1 week2 b 2.0 NaN NaN NaN\n2 week3 c 3.0 NaN NaN NaN\n3 week4 d 4.0 NaN NaN NaN"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "#### example for join = 'inner' and ignore_index = True : "
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "pd.concat([df,df1],join='inner',ignore_index = True)",
"execution_count": 26,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 26,
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>category</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>a</td>\n </tr>\n <tr>\n <th>1</th>\n <td>b</td>\n </tr>\n <tr>\n <th>2</th>\n <td>a</td>\n </tr>\n <tr>\n <th>3</th>\n <td>a</td>\n </tr>\n <tr>\n <th>4</th>\n <td>b</td>\n </tr>\n <tr>\n <th>5</th>\n <td>a</td>\n </tr>\n <tr>\n <th>6</th>\n <td>a</td>\n </tr>\n <tr>\n <th>7</th>\n <td>b</td>\n </tr>\n <tr>\n <th>8</th>\n <td>c</td>\n </tr>\n <tr>\n <th>9</th>\n <td>d</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " category\n0 a\n1 b\n2 a\n3 a\n4 b\n5 a\n6 a\n7 b\n8 c\n9 d"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "#### exercise:\nadd all of df1's columns as df's extra columns"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "pd.concat([df,df1],join='outer',axis = 1)",
"execution_count": 27,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 27,
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>category</th>\n <th>date</th>\n <th>revenue</th>\n <th>sales</th>\n <th>Date</th>\n <th>category</th>\n <th>cost</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>a</td>\n <td>week1</td>\n <td>14.0</td>\n <td>50</td>\n <td>week1</td>\n <td>a</td>\n <td>1.0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>b</td>\n <td>week1</td>\n <td>8.0</td>\n <td>20</td>\n <td>week2</td>\n <td>b</td>\n <td>2.0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>a</td>\n <td>week2</td>\n <td>16.0</td>\n <td>35</td>\n <td>week3</td>\n <td>c</td>\n <td>3.0</td>\n </tr>\n <tr>\n <th>3</th>\n <td>a</td>\n <td>week3</td>\n <td>NaN</td>\n <td>10</td>\n <td>week4</td>\n <td>d</td>\n <td>4.0</td>\n </tr>\n <tr>\n <th>4</th>\n <td>b</td>\n <td>week3</td>\n <td>7.0</td>\n <td>20</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>5</th>\n <td>a</td>\n <td>week3</td>\n <td>9.0</td>\n <td>40</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " category date revenue sales Date category cost\n0 a week1 14.0 50 week1 a 1.0\n1 b week1 8.0 20 week2 b 2.0\n2 a week2 16.0 35 week3 c 3.0\n3 a week3 NaN 10 week4 d 4.0\n4 b week3 7.0 20 NaN NaN NaN\n5 a week3 9.0 40 NaN NaN NaN"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "#### The difference between concat and merge or join is: \nThere is no key used in concat"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### DataFrame.append( other, ignore_index, sort, ...)\n\nAppend rows of other to the end of caller, returning a new object. Columns in other that are not in the caller are added as new columns.\n\n* other: DataFrame or Series/dict-like object, or list of these\n* ignore_index: bool, default False\n* sort: bool, default False. \n\n#### example 1: "
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df.append(df1)",
"execution_count": 29,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 29,
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Date</th>\n <th>category</th>\n <th>cost</th>\n <th>date</th>\n <th>revenue</th>\n <th>sales</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>NaN</td>\n <td>a</td>\n <td>NaN</td>\n <td>week1</td>\n <td>14.0</td>\n <td>50.0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>NaN</td>\n <td>b</td>\n <td>NaN</td>\n <td>week1</td>\n <td>8.0</td>\n <td>20.0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>NaN</td>\n <td>a</td>\n <td>NaN</td>\n <td>week2</td>\n <td>16.0</td>\n <td>35.0</td>\n </tr>\n <tr>\n <th>3</th>\n <td>NaN</td>\n <td>a</td>\n <td>NaN</td>\n <td>week3</td>\n <td>NaN</td>\n <td>10.0</td>\n </tr>\n <tr>\n <th>4</th>\n <td>NaN</td>\n <td>b</td>\n <td>NaN</td>\n <td>week3</td>\n <td>7.0</td>\n <td>20.0</td>\n </tr>\n <tr>\n <th>5</th>\n <td>NaN</td>\n <td>a</td>\n <td>NaN</td>\n <td>week3</td>\n <td>9.0</td>\n <td>40.0</td>\n </tr>\n <tr>\n <th>0</th>\n <td>week1</td>\n <td>a</td>\n <td>1.0</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>1</th>\n <td>week2</td>\n <td>b</td>\n <td>2.0</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>2</th>\n <td>week3</td>\n <td>c</td>\n <td>3.0</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>3</th>\n <td>week4</td>\n <td>d</td>\n <td>4.0</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " Date category cost date revenue sales\n0 NaN a NaN week1 14.0 50.0\n1 NaN b NaN week1 8.0 20.0\n2 NaN a NaN week2 16.0 35.0\n3 NaN a NaN week3 NaN 10.0\n4 NaN b NaN week3 7.0 20.0\n5 NaN a NaN week3 9.0 40.0\n0 week1 a 1.0 NaN NaN NaN\n1 week2 b 2.0 NaN NaN NaN\n2 week3 c 3.0 NaN NaN NaN\n3 week4 d 4.0 NaN NaN NaN"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "#### example 2: "
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "eg = df.append([df,df1])\neg",
"execution_count": 30,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 30,
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Date</th>\n <th>category</th>\n <th>cost</th>\n <th>date</th>\n <th>revenue</th>\n <th>sales</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>NaN</td>\n <td>a</td>\n <td>NaN</td>\n <td>week1</td>\n <td>14.0</td>\n <td>50.0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>NaN</td>\n <td>b</td>\n <td>NaN</td>\n <td>week1</td>\n <td>8.0</td>\n <td>20.0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>NaN</td>\n <td>a</td>\n <td>NaN</td>\n <td>week2</td>\n <td>16.0</td>\n <td>35.0</td>\n </tr>\n <tr>\n <th>3</th>\n <td>NaN</td>\n <td>a</td>\n <td>NaN</td>\n <td>week3</td>\n <td>NaN</td>\n <td>10.0</td>\n </tr>\n <tr>\n <th>4</th>\n <td>NaN</td>\n <td>b</td>\n <td>NaN</td>\n <td>week3</td>\n <td>7.0</td>\n <td>20.0</td>\n </tr>\n <tr>\n <th>5</th>\n <td>NaN</td>\n <td>a</td>\n <td>NaN</td>\n <td>week3</td>\n <td>9.0</td>\n <td>40.0</td>\n </tr>\n <tr>\n <th>0</th>\n <td>NaN</td>\n <td>a</td>\n <td>NaN</td>\n <td>week1</td>\n <td>14.0</td>\n <td>50.0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>NaN</td>\n <td>b</td>\n <td>NaN</td>\n <td>week1</td>\n <td>8.0</td>\n <td>20.0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>NaN</td>\n <td>a</td>\n <td>NaN</td>\n <td>week2</td>\n <td>16.0</td>\n <td>35.0</td>\n </tr>\n <tr>\n <th>3</th>\n <td>NaN</td>\n <td>a</td>\n <td>NaN</td>\n <td>week3</td>\n <td>NaN</td>\n <td>10.0</td>\n </tr>\n <tr>\n <th>4</th>\n <td>NaN</td>\n <td>b</td>\n <td>NaN</td>\n <td>week3</td>\n <td>7.0</td>\n <td>20.0</td>\n </tr>\n <tr>\n <th>5</th>\n <td>NaN</td>\n <td>a</td>\n <td>NaN</td>\n <td>week3</td>\n <td>9.0</td>\n <td>40.0</td>\n </tr>\n <tr>\n <th>0</th>\n <td>week1</td>\n <td>a</td>\n <td>1.0</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>1</th>\n <td>week2</td>\n <td>b</td>\n <td>2.0</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>2</th>\n <td>week3</td>\n <td>c</td>\n <td>3.0</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>3</th>\n <td>week4</td>\n <td>d</td>\n <td>4.0</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " Date category cost date revenue sales\n0 NaN a NaN week1 14.0 50.0\n1 NaN b NaN week1 8.0 20.0\n2 NaN a NaN week2 16.0 35.0\n3 NaN a NaN week3 NaN 10.0\n4 NaN b NaN week3 7.0 20.0\n5 NaN a NaN week3 9.0 40.0\n0 NaN a NaN week1 14.0 50.0\n1 NaN b NaN week1 8.0 20.0\n2 NaN a NaN week2 16.0 35.0\n3 NaN a NaN week3 NaN 10.0\n4 NaN b NaN week3 7.0 20.0\n5 NaN a NaN week3 9.0 40.0\n0 week1 a 1.0 NaN NaN NaN\n1 week2 b 2.0 NaN NaN NaN\n2 week3 c 3.0 NaN NaN NaN\n3 week4 d 4.0 NaN NaN NaN"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "eg.iloc[1]",
"execution_count": 31,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 31,
"data": {
"text/plain": "Date NaN\ncategory b\ncost NaN\ndate week1\nrevenue 8\nsales 20\nName: 1, dtype: object"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "## End\n\nThey are only a means to the end of learning how to do good data analysis "
}
],
"metadata": {
"kernelspec": {
"name": "python3",
"display_name": "Python 3",
"language": "python"
},
"gist": {
"id": "",
"data": {
"description": "Pandas - Merge and reshaping",
"public": false
}
},
"language_info": {
"file_extension": ".py",
"nbconvert_exporter": "python",
"version": "3.5.4",
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"name": "python",
"mimetype": "text/x-python",
"pygments_lexer": "ipython3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment