Skip to content

Instantly share code, notes, and snippets.

@benjaminkaplanphd
Created February 6, 2019 18:06
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save benjaminkaplanphd/783cf23ab409a8dc75d9cc8394c97630 to your computer and use it in GitHub Desktop.
Save benjaminkaplanphd/783cf23ab409a8dc75d9cc8394c97630 to your computer and use it in GitHub Desktop.
Explode array values in columns to multiple rows
import numpy as np
import pandas as pd
def explode(frame: pd.DataFrame, columns: List[str]):
"""
This helper function explodes a new row
for each value in an array of values.
If there is more than one column to be exploded,
the array lengths must be the same (row-wise)
(Adapted from a SE code snippet)
Args:
frame: The input dataframe
columns: the columns with arrays to explode
Returns:
transformed dataframe
"""
# all columns that are not arrays of values
idx_cols = frame.columns.difference(columns)
# calculate lengths of arrays
lens = frame[columns[0]].str.len()
return pd.DataFrame({
col: np.repeat(frame[col].values, lens)
for col in idx_cols
}).assign(**{col: np.concatenate(frame[col].values)
for col in columns}).loc[:, frame.columns]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment