Created
July 13, 2021 16:17
-
-
Save dutc/8db486aaa5515e65122f369c638a906b to your computer and use it in GitHub Desktop.
“Python Expert” Newsletter (July 28, 2021): Learning Corner
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from numpy import tile, repeat, int64 | |
from numpy.random import default_rng | |
from pandas import DataFrame, date_range, Timestamp, concat | |
from pandas.tseries.offsets import Day | |
from random import seed | |
from scipy.stats import zscore | |
from string import ascii_lowercase | |
if __name__ == '__main__': | |
rng = default_rng(s := Timestamp('2021-07-04').asm8.astype('uint32')) | |
seed(s) | |
tickers = rng.choice([*ascii_lowercase], size=(5, 4)).view('<U4').ravel() | |
dates = date_range('2021-07-04', periods=4) | |
df = DataFrame({ | |
'date': repeat(dates, len(tickers)), | |
'ticker': tile(tickers, len(dates)), | |
'price': tile( | |
rng.normal(loc=100, scale=50, size=len(tickers)).clip(10), | |
len(dates) | |
) + rng.normal(scale=5, size=(len(dates), len(tickers))).cumsum(axis=0).ravel(), | |
'volume': rng.integers(0, 1_000, size=len(tickers) * len(dates)), | |
'signal': rng.normal(size=len(tickers) * len(dates)), | |
'flag': rng.choice([True, False], size=len(tickers) * len(dates)), | |
}).set_index(['date', 'ticker']).sort_index() | |
print( | |
df.groupby('ticker').apply(lambda df: df['volume'] * df['price']), | |
df.groupby('ticker').apply(lambda df: concat([df, df])), | |
sep=f'\n{"-" * 78}\n', | |
) |
For the full write-up and discussion, sign up for the “Python Expert” newsletter!
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
As you can see:
.groupby.apply
is extremey general (but pays for this generality by being very slow!).groupby.apply
takes a function which operates on the entire group (as aDataFrame
)—it's useful if you want to compute something both across rows and across columns.groupby.apply
stitches the results of your operation together; this operation can change the (inner-most) indexing of theDataFrame
, but the result will always have the groups as its outer-most index