Skip to content

Instantly share code, notes, and snippets.

@cgarbin
Last active September 16, 2019 22:35
Show Gist options
  • Save cgarbin/5fcac821087904b6de919d62895366d8 to your computer and use it in GitHub Desktop.
Save cgarbin/5fcac821087904b6de919d62895366d8 to your computer and use it in GitHub Desktop.
Showing the steps Pandas executes when filtering
# Based on https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/03.08-Aggregation-and-Grouping.ipynb#scrollTo=uwsvvB3-s0yO
import numpy as np
import pandas as pd
rng = np.random.RandomState(0)
df = pd.DataFrame({'key': ['A', 'B', 'C', 'A', 'B', 'C'],
'data1': range(6),
'data2': rng.randint(0, 10, 6)},
columns=['key', 'data1', 'data2'])
iteration = 0
def filter_func(x):
global iteration
iteration += 1
s = x['data2'].std()
s_gt_4 = s > 4
print('\nPass #{:-<20}'.format(iteration))
print('type={}'.format(type(x)))
print(x)
print('\nstd()={:.2f}'.format(s))
print('std() is {} 4 - {}'
.format('>' if s > 4 else '<=',
'keep' if s_gt_4 else 'discard'))
return s_gt_4
print('The full DataFrame:')
print(df)
print('\nFiltering...')
f = df.groupby('key').filter(filter_func)
print('\nFiltered DataFrame:')
print(f)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment