Skip to content

Instantly share code, notes, and snippets.

@dutc
Created July 13, 2021 16:15
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dutc/54994a795abcff7f65912e4f4faedef6 to your computer and use it in GitHub Desktop.
Save dutc/54994a795abcff7f65912e4f4faedef6 to your computer and use it in GitHub Desktop.
“Python Expert” Newsletter (July 14, 2021): Learning Corner
from numpy import tile, repeat
from numpy.random import default_rng
from pandas import DataFrame, date_range, Timestamp
from pandas.tseries.offsets import Day
from random import seed
from scipy.stats import zscore
from string import ascii_lowercase
if __name__ == '__main__':
rng = default_rng(s := Timestamp('2021-07-04').asm8.astype('uint32'))
seed(s)
tickers = rng.choice([*ascii_lowercase], size=(5, 4)).view('<U4').ravel()
dates = date_range('2021-07-04', periods=4)
df = DataFrame({
'date': repeat(dates, len(tickers)),
'ticker': tile(tickers, len(dates)),
'price': tile(
rng.normal(loc=100, scale=50, size=len(tickers)).clip(10),
len(dates)
) + rng.normal(scale=5, size=(len(dates), len(tickers))).cumsum(axis=0).ravel(),
'volume': rng.integers(0, 1_000, size=len(tickers) * len(dates)),
'signal': rng.normal(size=len(tickers) * len(dates)),
'flag': rng.choice([True, False], size=len(tickers) * len(dates)),
}).set_index(['date', 'ticker']).sort_index()
print(
df.groupby('ticker').transform(lambda s: s + 1),
df.groupby('ticker')['signal'].transform(zscore),
df.groupby('ticker').transform(
lambda s: s.rolling(3, min_periods=1).mean(),
),
df.groupby('ticker').transform(
lambda s: s.reset_index('ticker', drop=True).rolling(Day(3), min_periods=1).mean(),
),
sep=f'\n{"-" * 78}\n',
)
@dutc
Copy link
Author

dutc commented Jul 13, 2021

As you can see:

  • .groupby.transform is needed when you want to supply a user-defined function, because the existing .groupby methods are insufficient.
  • the result of .groupby.transform will have the same indices as the original Series or DataFrame; groupby.transform is used when you want the transformation to preserve the original indexing
  • thus, the function passed to .groupby.transform must return a value that has the same shape as the group or can be broadcast to the shape of the group, the function must be able to operate on a column-by-column or row-by-row basis (depending on axis=); there is a fast-path if the operation can also apply the the entire DataFrame (but this fast-path is checked on the second group…)

@dutc
Copy link
Author

dutc commented Jul 13, 2021

For the full write-up and discussion, sign up for the “Python Expert” newsletter!

bit.ly/expert-python

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment