Skip to content

Instantly share code, notes, and snippets.

@kaedonkers
Created August 18, 2023 14:37
Show Gist options
  • Save kaedonkers/ef570f1342278477ac68fd8ca4bb04db to your computer and use it in GitHub Desktop.
Save kaedonkers/ef570f1342278477ac68fd8ca4bb04db to your computer and use it in GitHub Desktop.
Update entries in a Pandas DataFrame with entries in another Dataframe, without relying on compatible indices
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "raw",
"id": "007a570b-d2c2-47d6-8462-aaf20ab4f761",
"metadata": {},
"source": [
"---\n",
"created: 18 Aug 2023\n",
"author: kaedonkers\n",
"---"
]
},
{
"cell_type": "markdown",
"id": "3df9d0cc-036f-428d-b439-4f8be165e961",
"metadata": {},
"source": [
"# Update entries in Pandas DataFrame using another DataFrame, without relying on matching indices"
]
},
{
"cell_type": "markdown",
"id": "3dcfc1a3-63ea-4392-8632-879d735bb0b8",
"metadata": {},
"source": [
"## Setup"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "890245e6-a48b-4c75-b614-c9763f4d36c1",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "8867526f-d2f7-4b4c-aada-479273b2bdd6",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>a</th>\n",
" <th>b</th>\n",
" <th>c</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>1</td>\n",
" <td>a</td>\n",
" <td>one</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>2</td>\n",
" <td>b</td>\n",
" <td>two</td>\n",
" </tr>\n",
" <tr>\n",
" <th>30</th>\n",
" <td>3</td>\n",
" <td>c</td>\n",
" <td>three</td>\n",
" </tr>\n",
" <tr>\n",
" <th>40</th>\n",
" <td>4</td>\n",
" <td>old</td>\n",
" <td>four</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50</th>\n",
" <td>5</td>\n",
" <td>e</td>\n",
" <td>five</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" a b c\n",
"10 1 a one\n",
"20 2 b two\n",
"30 3 c three\n",
"40 4 old four\n",
"50 5 e five"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df1 = pd.DataFrame({\"a\": [1,2,3,4,5], \n",
" \"b\":[\"a\",\"b\",\"c\",\"old\",\"e\"], \n",
" \"c\": [\"one\", \"two\", \"three\", \"four\", \"five\"]}, \n",
" index=[10,20,30,40,50])\n",
"df1"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "09d90388-7f6d-4478-8bf8-33debd16c8bb",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>a</th>\n",
" <th>b</th>\n",
" <th>c</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>b</td>\n",
" <td>two</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>4</td>\n",
" <td>NEW</td>\n",
" <td>four</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>6</td>\n",
" <td>f</td>\n",
" <td>six</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" a b c\n",
"1 2 b two\n",
"2 4 NEW four\n",
"3 6 f six"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df2 = pd.DataFrame({\"a\": [2,4,6], \n",
" \"b\":[\"b\",\"NEW\",\"f\"], \n",
" \"c\":[\"two\",\"four\",\"six\"]}, \n",
" index=[1,2,3])\n",
"df2"
]
},
{
"cell_type": "markdown",
"id": "2b2c50bd-e2a1-4bde-bf99-b40aa43219c9",
"metadata": {},
"source": [
"## Update entries in `df1` with newer entries in `df2`, while retaining shape of `df1`\n",
"\n",
"Using `how=\"left\"`"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "c906f0ab-b6a8-40bf-8706-7be801a6c253",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>a</th>\n",
" <th>c</th>\n",
" <th>b</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>one</td>\n",
" <td>a</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>two</td>\n",
" <td>b</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>three</td>\n",
" <td>c</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>four</td>\n",
" <td>NEW</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>five</td>\n",
" <td>e</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" a c b\n",
"0 1 one a\n",
"1 2 two b\n",
"2 3 three c\n",
"3 4 four NEW\n",
"4 5 five e"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(df1\n",
" .merge(df2, how=\"left\", on=[\"a\", \"c\"])\n",
" .assign(b=lambda row: row.b_y.fillna(row.b_x))\n",
" .drop(columns=[\"b_x\",\"b_y\"])\n",
")"
]
},
{
"cell_type": "markdown",
"id": "537a87c0-c392-4fa7-9592-681b39f9350d",
"metadata": {},
"source": [
"## Update entries in `df1` with newer entries in `df2`, and keep maximum number of entries\n",
"Using `how=\"outer\"`"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "340722be-4bcc-4c37-a6cb-032dce3685b7",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>a</th>\n",
" <th>c</th>\n",
" <th>b</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>one</td>\n",
" <td>a</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>two</td>\n",
" <td>b</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>three</td>\n",
" <td>c</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>four</td>\n",
" <td>NEW</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>five</td>\n",
" <td>e</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>6</td>\n",
" <td>six</td>\n",
" <td>f</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" a c b\n",
"0 1 one a\n",
"1 2 two b\n",
"2 3 three c\n",
"3 4 four NEW\n",
"4 5 five e\n",
"5 6 six f"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df1.merge(df2, how=\"outer\", on=[\"a\", \"c\"]).assign(b=lambda row: row.b_y.fillna(row.b_x)).drop(columns=[\"b_x\",\"b_y\"])"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7466b400-689b-4b2b-a40a-79062c8c3354",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "datasci",
"language": "python",
"name": "datasci"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment