Skip to content

Instantly share code, notes, and snippets.

@zouweilin
Forked from jlln/separator.py
Last active April 22, 2019 12:28
Show Gist options
  • Save zouweilin/fcde1b8c662822fa4ced4db13dbb779d to your computer and use it in GitHub Desktop.
Save zouweilin/fcde1b8c662822fa4ced4db13dbb779d to your computer and use it in GitHub Desktop.
Efficiently split Pandas Dataframe cells containing lists into multiple rows, duplicating the other column's values.
def split_dataframe_rows(df,column_selectors, row_delimiter):
# we need to keep track of the ordering of the columns
def _split_list_to_rows(row,row_accumulator,column_selector,row_delimiter):
split_rows = {}
max_split = 0
for column_selector in column_selectors:
split_row = row[column_selector].split(row_delimiter)
split_rows[column_selector] = split_row
if len(split_row) > max_split:
max_split = len(split_row)
for i in range(max_split):
new_row = row.to_dict()
for column_selector in column_selectors:
try:
new_row[column_selector] = split_rows[column_selector].pop(0)
except IndexError:
new_row[column_selector] = ''
row_accumulator.append(new_row)
new_rows = []
df.apply(_split_list_to_rows,axis=1,args = (new_rows,column_selectors,row_delimiter))
new_df = pd.DataFrame(new_rows, columns=df.columns)
return new_df
@zouweilin
Copy link
Author

image

@zouweilin
Copy link
Author

image

@nitishkmr1989
Copy link

Hi,

Where do i need to make changes in the code if i need to use my dataframe with different delimiter than yours?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment