Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save hiropppe/ec8a7e64ac82f31a450d91d9444bafaa to your computer and use it in GitHub Desktop.
Save hiropppe/ec8a7e64ac82f31a450d91d9444bafaa to your computer and use it in GitHub Desktop.
Pandas で count( * ) over ( partition by ... 的な結果がしたかった
In [1]: import pandas as pd
...:
...: data = {'m': ['m1','m2','m2','m3','m3','m3','m4','m4','m4','m4'],
...: 'e': ['e1','e2','e3','e4','e5','e6','e7','e1','e2','e8'],
...: 'p': [0.9, 0.2, 0.8, 0.7, 0.1, 0.2, 0.3, 0.1, 0.2, 0.4]}
...: df = pd.DataFrame(data)
...: df
...:
Out[1]:
e m p
0 e1 m1 0.9
1 e2 m2 0.2
2 e3 m2 0.8
3 e4 m3 0.7
4 e5 m3 0.1
5 e6 m3 0.2
6 e7 m4 0.3
7 e1 m4 0.1
8 e2 m4 0.2
9 e8 m4 0.4
In [2]: gm = df.groupby(by=['m'])
In [3]: count = gm['e'].count()
...: count = count.to_frame()
...: count.columns = ['count']
...: count
...:
Out[3]:
count
m
m1 1
m2 2
m3 3
m4 4
In [4]: maxp = gm['p'].max()
...: maxp = maxp.to_frame()
...: maxp.columns = ['maxp']
...: maxp
...:
Out[4]:
maxp
m
m1 0.9
m2 0.8
m3 0.7
m4 0.4
In [5]: df = pd.merge(df, count, left_on='m', right_index=True)
...: df = pd.merge(df, maxp, left_on='m', right_index=True)
...: df
...:
Out[5]:
e m p count maxp
0 e1 m1 0.9 1 0.9
1 e2 m2 0.2 2 0.8
2 e3 m2 0.8 2 0.8
3 e4 m3 0.7 3 0.7
4 e5 m3 0.1 3 0.7
5 e6 m3 0.2 3 0.7
6 e7 m4 0.3 4 0.4
7 e1 m4 0.1 4 0.4
8 e2 m4 0.2 4 0.4
9 e8 m4 0.4 4 0.4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment