Last active
July 13, 2021 16:16
-
-
Save dutc/9af4fe75a28e867f6daf2fc83cc2a67a to your computer and use it in GitHub Desktop.
“Python Expert” Newsletter (July 21, 2021): Learning Corner
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from numpy import tile, repeat | |
from numpy.random import default_rng | |
from pandas import DataFrame, date_range, Timestamp | |
from pandas.tseries.offsets import Day | |
from random import seed | |
from scipy.stats import zscore | |
from string import ascii_lowercase | |
if __name__ == '__main__': | |
rng = default_rng(s := Timestamp('2021-07-04').asm8.astype('uint32')) | |
seed(s) | |
tickers = rng.choice([*ascii_lowercase], size=(5, 4)).view('<U4').ravel() | |
dates = date_range('2021-07-04', periods=4) | |
df = DataFrame({ | |
'date': repeat(dates, len(tickers)), | |
'ticker': tile(tickers, len(dates)), | |
'price': tile( | |
rng.normal(loc=100, scale=50, size=len(tickers)).clip(10), | |
len(dates) | |
) + rng.normal(scale=5, size=(len(dates), len(tickers))).cumsum(axis=0).ravel(), | |
'volume': rng.integers(0, 1_000, size=len(tickers) * len(dates)), | |
'signal': rng.normal(size=len(tickers) * len(dates)), | |
'flag': rng.choice([True, False], size=len(tickers) * len(dates)), | |
}).set_index(['date', 'ticker']).sort_index() | |
print( | |
df.groupby('ticker').agg(lambda s: s.sum()), | |
df.groupby('ticker').aggregate(lambda s: s.sum()), | |
df.groupby('ticker')['signal'].agg(lambda s: zscore(s)[-1]), | |
sep=f'\n{"-" * 78}\n', | |
) |
For the full write-up and discussion, sign up for the “Python Expert” newsletter!
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
As you can see:
.groupby.agg
is needed when you want to supply a user-defined function, because the existing.groupby
methods are insufficient.groupby.agg
will produce a DataFrame whose indexing corresponds to the groups.groupby
methods (like.groupby.sum
) are roughly equivalent to.groupby.agg(sum)
but with a more efficient formulation; a minotiry are equivalent to.groupby.transform
(like.groupby.cumsum
and.groupby(…).transform(lambda s: s.cumsum()
).groupby.agg
opertes on a row-by-row or column-by-column basis like.groupby.transform
but does not have a fast-path.groupby.agg
must produce a scalar value (specifically, whatpandas
considers a scalar value, such aslist
orset
but not anumpy.ndarray
unless its zero-dimensional or one-dimensional with one item!)