Skip to content

Instantly share code, notes, and snippets.

@naqushab
Forked from jlln/separator.py
Created May 6, 2020 11:37
Show Gist options
  • Save naqushab/cbaef9334d3a78dfb773f2e24e9b1c3b to your computer and use it in GitHub Desktop.
Save naqushab/cbaef9334d3a78dfb773f2e24e9b1c3b to your computer and use it in GitHub Desktop.
Efficiently split Pandas Dataframe cells containing lists into multiple rows, duplicating the other column's values.
def splitDataFrameList(df,target_column,separator):
''' df = dataframe to split,
target_column = the column containing the values to split
separator = the symbol used to perform the split
returns: a dataframe with each entry for the target column separated, with each element moved into a new row.
The values in the other columns are duplicated across the newly divided rows.
'''
def splitListToRows(row,row_accumulator,target_column,separator):
split_row = row[target_column].split(separator)
for s in split_row:
new_row = row.to_dict()
new_row[target_column] = s
row_accumulator.append(new_row)
new_rows = []
df.apply(splitListToRows,axis=1,args = (new_rows,target_column,separator))
new_df = pandas.DataFrame(new_rows)
return new_df
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment