Skip to content

Instantly share code, notes, and snippets.

@bjelline
Created November 2, 2014 16:42
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bjelline/d8066de66e305887b714 to your computer and use it in GitHub Desktop.
Save bjelline/d8066de66e305887b714 to your computer and use it in GitHub Desktop.
Pandas, groupby and finding maximum in groups
#!/usr/local/bin/python2.7
import numpy as np
import pandas as pd
df_logfile = pd.DataFrame({ 'host' : ['this.com', 'this.com', 'this.com', 'that.com', 'other.net', 'other.net', 'other.net'],
'service' : ['mail', 'mail', 'web', 'mail', 'mail', 'web', 'web' ] })
print "Input"
print df_logfile
df = df_logfile.groupby(['host','service']).agg({'service':np.size})
df_count = pd.DataFrame()
df_count['host'] = df_logfile['host'].unique()
df_count['service'] = np.nan
df_count['no'] = np.nan
for h,data in df.groupby(level=0):
i = data.idxmax()[0]
service = i[1]
no = data.xs(i)[0]
df_count.loc[df_count['host'] == h, 'service'] = service
df_count.loc[(df_count['host'] == h) & (df_count['service'] == service), 'no'] = no
print "\nOutput"
print df_count
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment