Created
July 13, 2021 16:15
-
-
Save dutc/54994a795abcff7f65912e4f4faedef6 to your computer and use it in GitHub Desktop.
“Python Expert” Newsletter (July 14, 2021): Learning Corner
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from numpy import tile, repeat | |
from numpy.random import default_rng | |
from pandas import DataFrame, date_range, Timestamp | |
from pandas.tseries.offsets import Day | |
from random import seed | |
from scipy.stats import zscore | |
from string import ascii_lowercase | |
if __name__ == '__main__': | |
rng = default_rng(s := Timestamp('2021-07-04').asm8.astype('uint32')) | |
seed(s) | |
tickers = rng.choice([*ascii_lowercase], size=(5, 4)).view('<U4').ravel() | |
dates = date_range('2021-07-04', periods=4) | |
df = DataFrame({ | |
'date': repeat(dates, len(tickers)), | |
'ticker': tile(tickers, len(dates)), | |
'price': tile( | |
rng.normal(loc=100, scale=50, size=len(tickers)).clip(10), | |
len(dates) | |
) + rng.normal(scale=5, size=(len(dates), len(tickers))).cumsum(axis=0).ravel(), | |
'volume': rng.integers(0, 1_000, size=len(tickers) * len(dates)), | |
'signal': rng.normal(size=len(tickers) * len(dates)), | |
'flag': rng.choice([True, False], size=len(tickers) * len(dates)), | |
}).set_index(['date', 'ticker']).sort_index() | |
print( | |
df.groupby('ticker').transform(lambda s: s + 1), | |
df.groupby('ticker')['signal'].transform(zscore), | |
df.groupby('ticker').transform( | |
lambda s: s.rolling(3, min_periods=1).mean(), | |
), | |
df.groupby('ticker').transform( | |
lambda s: s.reset_index('ticker', drop=True).rolling(Day(3), min_periods=1).mean(), | |
), | |
sep=f'\n{"-" * 78}\n', | |
) |
For the full write-up and discussion, sign up for the “Python Expert” newsletter!
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
As you can see:
.groupby.transform
is needed when you want to supply a user-defined function, because the existing.groupby
methods are insufficient..groupby.transform
will have the same indices as the originalSeries
orDataFrame
;groupby.transform
is used when you want the transformation to preserve the original indexing.groupby.transform
must return a value that has the same shape as the group or can be broadcast to the shape of the group, the function must be able to operate on a column-by-column or row-by-row basis (depending onaxis=
); there is a fast-path if the operation can also apply the the entireDataFrame
(but this fast-path is checked on the second group…)