Skip to content

Instantly share code, notes, and snippets.

@jlln
Last active November 9, 2023 19:59
Show Gist options
  • Star 96 You must be signed in to star a gist
  • Fork 27 You must be signed in to fork a gist
  • Save jlln/338b4b0b55bd6984f883 to your computer and use it in GitHub Desktop.
Save jlln/338b4b0b55bd6984f883 to your computer and use it in GitHub Desktop.
Efficiently split Pandas Dataframe cells containing lists into multiple rows, duplicating the other column's values.
def splitDataFrameList(df,target_column,separator):
''' df = dataframe to split,
target_column = the column containing the values to split
separator = the symbol used to perform the split
returns: a dataframe with each entry for the target column separated, with each element moved into a new row.
The values in the other columns are duplicated across the newly divided rows.
'''
def splitListToRows(row,row_accumulator,target_column,separator):
split_row = row[target_column].split(separator)
for s in split_row:
new_row = row.to_dict()
new_row[target_column] = s
row_accumulator.append(new_row)
new_rows = []
df.apply(splitListToRows,axis=1,args = (new_rows,target_column,separator))
new_df = pandas.DataFrame(new_rows)
return new_df
@john0305
Copy link

Super excited, this is my first time commenting with something I made, be gentle. I needed something similar when a number and a string were in one column separated by a comma (blanks as well). I modified my code a bit to hopefully work a little more universally.

The code checks how many times a delimiter is used in each column row, then repeats that line for each one.

import pandas as pd
from itertools import chain
import numpy as np

def chainer(df,col,sep,dtype):
  df = df.astype({col:dtype})
  lens = df[col].str.split(sep).map(len)
  dicts = {}
  for cols in df.columns:
    if cols == col:
      dicts[cols] = list(chain.from_iterable(df[cols].str.split(sep)))
    else:
      dicts[cols] = np.repeat(df[cols],lens)
  return pd.DataFrame.from_dict(dicts)

df = chainer(df,'Combined Column',',','str')

Added the astype because my column wouldn't convert a float or NaN, after using str worked like a champ.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment