Skip to content

Instantly share code, notes, and snippets.

@min2bro
Created December 22, 2018 15:26
Show Gist options
  • Save min2bro/22a43b03eeec2dd5d2ceadc1a06527c4 to your computer and use it in GitHub Desktop.
Save min2bro/22a43b03eeec2dd5d2ceadc1a06527c4 to your computer and use it in GitHub Desktop.
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Pandas trick for the Day"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## How to do Excel Vlookup in Pandas"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### First Dataframe contains the Student Names and their respective Scores"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Name</th>\n",
" <th>Score</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Alice</td>\n",
" <td>300</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Branda</td>\n",
" <td>450</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Tim</td>\n",
" <td>500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Mike</td>\n",
" <td>600</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Rachel</td>\n",
" <td>725</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Name Score\n",
"0 Alice 300\n",
"1 Branda 450\n",
"2 Tim 500\n",
"3 Mike 600\n",
"4 Rachel 725"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df=pd.DataFrame({'Score':[300,450,500,600,725],'Name':['Alice','Branda','Tim','Mike','Rachel']})\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Second DataFrame contains the Score and Grades"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Grade</th>\n",
" <th>Score</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>B</td>\n",
" <td>300</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>C</td>\n",
" <td>400</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>D</td>\n",
" <td>500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>E</td>\n",
" <td>600</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>F</td>\n",
" <td>700</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Grade Score\n",
"0 B 300\n",
"1 C 400\n",
"2 D 500\n",
"3 E 600\n",
"4 F 700"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df1=pd.DataFrame({'Score':[300,400,500,600,700],'Grade':['B','C','D','E','F']})\n",
"df1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Merge the Two Dataframes and lookup for corresponding Grades for their Scores\n",
"### Does a left-join except that we match on nearest key rather than equal keys"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Name</th>\n",
" <th>Score</th>\n",
" <th>Grade</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Alice</td>\n",
" <td>300</td>\n",
" <td>B</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Branda</td>\n",
" <td>450</td>\n",
" <td>C</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Tim</td>\n",
" <td>500</td>\n",
" <td>D</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Mike</td>\n",
" <td>600</td>\n",
" <td>E</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Rachel</td>\n",
" <td>725</td>\n",
" <td>F</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Name Score Grade\n",
"0 Alice 300 B\n",
"1 Branda 450 C\n",
"2 Tim 500 D\n",
"3 Mike 600 E\n",
"4 Rachel 725 F"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.merge_asof(df, df1, on='Score')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Example 2:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### First Dataframe contains Label and TIN #"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Label</th>\n",
" <th>TIN</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>ABC</td>\n",
" <td>I1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>DEF</td>\n",
" <td>I2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>JHK</td>\n",
" <td>I3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>LMN</td>\n",
" <td>I4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>OPQ</td>\n",
" <td>I5</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Label TIN\n",
"0 ABC I1\n",
"1 DEF I2\n",
"2 JHK I3\n",
"3 LMN I4\n",
"4 OPQ I5"
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df=pd.DataFrame({'Label':['ABC','DEF','JHK','LMN','OPQ'],'TIN':['I1','I2','I3','I4','I5']})\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Second Dataframe Contains TIN Numeric Values"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>TIN</th>\n",
" <th>Value</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>I2</td>\n",
" <td>200</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>I3</td>\n",
" <td>500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>I4</td>\n",
" <td>900</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" TIN Value\n",
"0 I2 200\n",
"1 I3 500\n",
"2 I4 900"
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df1=pd.DataFrame({'TIN':['I2','I3','I4'],'Value':[200,500,900]})\n",
"df1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Expected Value:"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Label</th>\n",
" <th>Value</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>DEF</td>\n",
" <td>100</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>JHK</td>\n",
" <td>200</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>LMN</td>\n",
" <td>300</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Label Value\n",
"0 DEF 100\n",
"1 JHK 200\n",
"2 LMN 300"
]
},
"execution_count": 58,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df2=pd.DataFrame({'Label':['DEF','JHK','LMN'],'Value':[100,200,300]})\n",
"df2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Merge the Two dataframe using Pandas Merge function: \n",
"### Merges DataFrame objects by performing a database-style join operation by columns or indexes."
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Label</th>\n",
" <th>TIN</th>\n",
" <th>Value</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>DEF</td>\n",
" <td>I2</td>\n",
" <td>200</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>JHK</td>\n",
" <td>I3</td>\n",
" <td>500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>LMN</td>\n",
" <td>I4</td>\n",
" <td>900</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Label TIN Value\n",
"0 DEF I2 200\n",
"1 JHK I3 500\n",
"2 LMN I4 900"
]
},
"execution_count": 64,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df3 = pd.merge(df, df1)\n",
"\n",
"df3"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment