Skip to content

Instantly share code, notes, and snippets.

@sammatuba
Last active November 12, 2023 14:52
Show Gist options
  • Star 9 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sammatuba/70269c2b5268a83344f5de609ea9b3cc to your computer and use it in GitHub Desktop.
Save sammatuba/70269c2b5268a83344f5de609ea9b3cc to your computer and use it in GitHub Desktop.
quickHow: how to split a column of tuples into a pandas dataframe
# Given a pandas dataframe containing a pandas series/column of tuples B,
# we want to extract B into B1 and B2 and assign them into separate pandas series
# Method 1: (faster)
# use pd.Series.tolist() method to return a list of tuples
# use pd.DataFrame on the resulting list to turn it into a new pd.DataFrame object, while specifying the original df index
# add to the original df
import pandas as pd
import time
# Put your dataframe here
df = pd.DataFrame({'A':[1,2], 'B':[(1,2), (3,4)]})
print("Original Dataset")
print(df)
start = time.time()
df[['B1','B2']] = pd.DataFrame(df['B'].tolist(),index=df.index)
print("Method 1")
print("Time elapsed :" + str(time.time()-start))
print(df)
# Method 2: (more Pythonic but much slower for larger dataframes)
# use the pd.DataFram.apply method to the column with the pd.Series function
start = time.time()
df[['B1','B2']] = df['B'].apply(pd.Series)
print("Method 2")
print("Time elapsed :" + str(time.time()-start))
print(df)
@LazolaJavu
Copy link

Thank you, this was really helpful.

@Prathamesh-Ghatole
Copy link

The best tip I've read in a while. Thanks a ton!

@EladDan
Copy link

EladDan commented Dec 14, 2022

Bravo!

@Phuongbui2711
Copy link

My kernel even died with the second method. Thank you so much, very helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment