Created
July 4, 2019 09:35
-
-
Save sammatuba/28f9dabeb7b50d70a39d13a754289c2e to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### QuickHow\n", | |
"Given a pandas dataframe containing a pandas series/column of tuples B, \n", | |
"we want to extract B into B1 and B2 and assign them into separate pandas series" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import pandas as pd\n", | |
"import time " | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Method 1: (faster)\n", | |
"\n", | |
"- use pd.Series.tolist() method to return a list of tuples \n", | |
"\n", | |
"- use pd.DataFrame on the resulting list to turn it into a new pd.DataFrame object, while specifying the original df index\n", | |
"\n", | |
"- add to the original df " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Original Dataset\n", | |
" A B\n", | |
"0 1 (1, 2)\n", | |
"1 2 (3, 4)\n", | |
"Method 1\n", | |
"Time elapsed :0.005747556686401367\n", | |
" A B B1 B2\n", | |
"0 1 (1, 2) 1 2\n", | |
"1 2 (3, 4) 3 4\n" | |
] | |
} | |
], | |
"source": [ | |
"df = pd.DataFrame({'A':[1,2], 'B':[(1,2), (3,4)]}) \n", | |
"\n", | |
"print(\"Original Dataset\")\n", | |
"print(df)\n", | |
"\n", | |
"start = time.time()\n", | |
"df[['B1','B2']] = pd.DataFrame(df['B'].tolist(),index=df.index)\n", | |
"print(\"Method 1\")\n", | |
"print(\"Time elapsed :\" + str(time.time()-start))\n", | |
"print(df)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Method 2: (more Pythonic but much slower for larger dataframes)\n", | |
"\n", | |
"- use the pd.DataFram.apply method to the column with the pd.Series function" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Method 2\n", | |
"Time elapsed :0.0069963932037353516\n", | |
" A B B1 B2\n", | |
"0 1 (1, 2) 1 2\n", | |
"1 2 (3, 4) 3 4\n" | |
] | |
} | |
], | |
"source": [ | |
"start = time.time()\n", | |
"df[['B1','B2']] = df['B'].apply(pd.Series)\n", | |
"print(\"Method 2\")\n", | |
"print(\"Time elapsed :\" + str(time.time()-start))\n", | |
"print(df)" | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.6.8" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment