Last active
October 7, 2024 06:22
-
-
Save sammatuba/70269c2b5268a83344f5de609ea9b3cc to your computer and use it in GitHub Desktop.
quickHow: how to split a column of tuples into a pandas dataframe
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Given a pandas dataframe containing a pandas series/column of tuples B, | |
# we want to extract B into B1 and B2 and assign them into separate pandas series | |
# Method 1: (faster) | |
# use pd.Series.tolist() method to return a list of tuples | |
# use pd.DataFrame on the resulting list to turn it into a new pd.DataFrame object, while specifying the original df index | |
# add to the original df | |
import pandas as pd | |
import time | |
# Put your dataframe here | |
df = pd.DataFrame({'A':[1,2], 'B':[(1,2), (3,4)]}) | |
print("Original Dataset") | |
print(df) | |
start = time.time() | |
df[['B1','B2']] = pd.DataFrame(df['B'].tolist(),index=df.index) | |
print("Method 1") | |
print("Time elapsed :" + str(time.time()-start)) | |
print(df) | |
# Method 2: (more Pythonic but much slower for larger dataframes) | |
# use the pd.DataFram.apply method to the column with the pd.Series function | |
start = time.time() | |
df[['B1','B2']] = df['B'].apply(pd.Series) | |
print("Method 2") | |
print("Time elapsed :" + str(time.time()-start)) | |
print(df) | |
Thank you, this was really helpful.
The best tip I've read in a while. Thanks a ton!
Bravo!
My kernel even died with the second method. Thank you so much, very helpful.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This was incredibly helpful! Thank you for sharing 🎉