Skip to content

Instantly share code, notes, and snippets.

@rainsunny
Created September 5, 2017 03:57
Show Gist options
  • Save rainsunny/a4fd8f760194dc55da67b0fd2384395e to your computer and use it in GitHub Desktop.
Save rainsunny/a4fd8f760194dc55da67b0fd2384395e to your computer and use it in GitHub Desktop.
pandas: expand data frame column into multiple rows
"""
Turn this
days name
[1,3,5,7] John
into this
days name
1 John
3 John
5 John
7 John
Like spark DataFrame's explode() method
Reference: https://stackoverflow.com/questions/38203352/expand-pandas-dataframe-column-into-multiple-rows
"""
import numpy as np
import pandas as pd
df = pd.DataFrame( {"name": ["John", "Eric"], "days": [[1,3,5,7], [2,4]] } )
# Using numpy's repeat to expand
lens = [ len(item) for item in df["days"]]
expanded = pd.DataFrame( {"name": np.repeat(df["name"].values, lens), "days": np.concatenate(df["days"].values)} )
# Turn this into a method
def expand(df, expand_column):
"""
Expand df on expand_column
df: pandas.DataFrame
expand_column: the column name to expand
"""
lens = [ len(item) for item in df[expand_column]]
d = {}
d[expand_column] = np.concatenate(df[expand_column].values)
for col in df.columns.values:
if col != expand_column:
d[col] = np.repeat(df[col].values, lens)
return pd.DataFrame(d)
# TODO
"""
How to turn this
days result count
1 true 28
1 false 3
2 true 37
2 false 5
into this
days true false
1 28 3
2 37 5
?
"""
@rainsunny
Copy link
Author

To solve the TODO

Using DataFrame.pivot:

df = pd.DataFrame({'days': [1,1,2,2], 'result':['true','false','true','false'], 'count':[28,3,37,5]})
df.pivot(index='days', columns='result', values='count')

Using DataFrame.unstack:

df1 = df.set_index(['days','result']).unstack()
df1.columns = ['false', 'true'] # Optional, change multi-level index to one level

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment