Skip to content

Instantly share code, notes, and snippets.

@dutc
Created July 13, 2021 16:17
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dutc/8db486aaa5515e65122f369c638a906b to your computer and use it in GitHub Desktop.
Save dutc/8db486aaa5515e65122f369c638a906b to your computer and use it in GitHub Desktop.
“Python Expert” Newsletter (July 28, 2021): Learning Corner
from numpy import tile, repeat, int64
from numpy.random import default_rng
from pandas import DataFrame, date_range, Timestamp, concat
from pandas.tseries.offsets import Day
from random import seed
from scipy.stats import zscore
from string import ascii_lowercase
if __name__ == '__main__':
rng = default_rng(s := Timestamp('2021-07-04').asm8.astype('uint32'))
seed(s)
tickers = rng.choice([*ascii_lowercase], size=(5, 4)).view('<U4').ravel()
dates = date_range('2021-07-04', periods=4)
df = DataFrame({
'date': repeat(dates, len(tickers)),
'ticker': tile(tickers, len(dates)),
'price': tile(
rng.normal(loc=100, scale=50, size=len(tickers)).clip(10),
len(dates)
) + rng.normal(scale=5, size=(len(dates), len(tickers))).cumsum(axis=0).ravel(),
'volume': rng.integers(0, 1_000, size=len(tickers) * len(dates)),
'signal': rng.normal(size=len(tickers) * len(dates)),
'flag': rng.choice([True, False], size=len(tickers) * len(dates)),
}).set_index(['date', 'ticker']).sort_index()
print(
df.groupby('ticker').apply(lambda df: df['volume'] * df['price']),
df.groupby('ticker').apply(lambda df: concat([df, df])),
sep=f'\n{"-" * 78}\n',
)
@dutc
Copy link
Author

dutc commented Jul 13, 2021

As you can see:

  • .groupby.apply is extremey general (but pays for this generality by being very slow!)
  • .groupby.apply takes a function which operates on the entire group (as a DataFrame)—it's useful if you want to compute something both across rows and across columns
  • .groupby.apply stitches the results of your operation together; this operation can change the (inner-most) indexing of the DataFrame, but the result will always have the groups as its outer-most index

@dutc
Copy link
Author

dutc commented Jul 13, 2021

For the full write-up and discussion, sign up for the “Python Expert” newsletter!

bit.ly/expert-python

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment