Skip to content

Instantly share code, notes, and snippets.

@dutc
Last active July 13, 2021 16:16
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dutc/9af4fe75a28e867f6daf2fc83cc2a67a to your computer and use it in GitHub Desktop.
Save dutc/9af4fe75a28e867f6daf2fc83cc2a67a to your computer and use it in GitHub Desktop.
“Python Expert” Newsletter (July 21, 2021): Learning Corner
from numpy import tile, repeat
from numpy.random import default_rng
from pandas import DataFrame, date_range, Timestamp
from pandas.tseries.offsets import Day
from random import seed
from scipy.stats import zscore
from string import ascii_lowercase
if __name__ == '__main__':
rng = default_rng(s := Timestamp('2021-07-04').asm8.astype('uint32'))
seed(s)
tickers = rng.choice([*ascii_lowercase], size=(5, 4)).view('<U4').ravel()
dates = date_range('2021-07-04', periods=4)
df = DataFrame({
'date': repeat(dates, len(tickers)),
'ticker': tile(tickers, len(dates)),
'price': tile(
rng.normal(loc=100, scale=50, size=len(tickers)).clip(10),
len(dates)
) + rng.normal(scale=5, size=(len(dates), len(tickers))).cumsum(axis=0).ravel(),
'volume': rng.integers(0, 1_000, size=len(tickers) * len(dates)),
'signal': rng.normal(size=len(tickers) * len(dates)),
'flag': rng.choice([True, False], size=len(tickers) * len(dates)),
}).set_index(['date', 'ticker']).sort_index()
print(
df.groupby('ticker').agg(lambda s: s.sum()),
df.groupby('ticker').aggregate(lambda s: s.sum()),
df.groupby('ticker')['signal'].agg(lambda s: zscore(s)[-1]),
sep=f'\n{"-" * 78}\n',
)
@dutc
Copy link
Author

dutc commented Jul 13, 2021

As you can see:

  • .groupby.agg is needed when you want to supply a user-defined function, because the existing .groupby methods are insufficient
  • .groupby.agg will produce a DataFrame whose indexing corresponds to the groups
    • the majority of the existing .groupby methods (like .groupby.sum) are roughly equivalent to .groupby.agg(sum) but with a more efficient formulation; a minotiry are equivalent to .groupby.transform (like .groupby.cumsum and .groupby(…).transform(lambda s: s.cumsum())
  • .groupby.agg opertes on a row-by-row or column-by-column basis like .groupby.transform but does not have a fast-path
  • the function passed to .groupby.agg must produce a scalar value (specifically, what pandas considers a scalar value, such as list or set but not a numpy.ndarray unless its zero-dimensional or one-dimensional with one item!)

@dutc
Copy link
Author

dutc commented Jul 13, 2021

For the full write-up and discussion, sign up for the “Python Expert” newsletter!

bit.ly/expert-python

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment