Problem: would like to use pivot_table
[1] to pivot and ignore irrelevant columns
in the DataFrame
.
Given the DataFrame
df = pd.DataFrame(
{
"A": ["foo", "foo", "foo", "foo", "foo", "bar", "bar", "bar", "bar"],
"B": ["one", "one", "one", "two", "two", "one", "one", "two", "two"],
"C": ["small", "large", "large", "small", "small", "large", "small", "small", "large"],
"D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
"E": [2, 4, 5, 5, 6, 6, 8, 9, 9]})
}
)
A B C D E 0 foo one small 1 2 1 foo one large 2 4 2 foo one large 2 5 3 foo two small 3 5 4 foo two small 3 6 5 bar one large 4 6 6 bar one small 5 8 7 bar two small 6 9 8 bar two large 7 9
We can count the number of each category in C
with
df.pivot_table(index=["A", "B"], columns=["C"], aggfunc="count", fill_value=0)
# , or aggfunc=len also works
but the result includes all the remaining columns:
D E C large small large small A B bar one 1 1 1 1 two 1 1 1 1 foo one 2 1 2 1 two 0 2 0 2
Passing values=[]
removes them
df.pivot_table(index=["A", "B"], columns=["C"], values=[], aggfunc=len, fill_value=0)
C large small A B bar one 1 1 two 1 1 foo one 2 1 two 0 2
however the same functional call with aggfunc
set to ~”count”~ returns an empty DataFrame
. 🤷
[1] It’s difficult enough to remember how to use one function.