Skip to content

Instantly share code, notes, and snippets.

@zhuchangzhan
Last active August 23, 2021 13:49
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save zhuchangzhan/75ba824bcec95942ac541fa699b6d541 to your computer and use it in GitHub Desktop.
Save zhuchangzhan/75ba824bcec95942ac541fa699b6d541 to your computer and use it in GitHub Desktop.
TPCX Blog Bodo Code
import bodo
import pandas as pd
import numpy as np
@bodo.jit
def tpcx_bb_q26():
store_sales = pd.read_parquet('s3://...')
item = pd.read_parquet('s3://...')
item2 = item[item['i_category'] == 'Books']
sale_items = pd.merge(
store_sales, item2, left_on='ss_item_sk', right_on='i_item_sk'
)
count1 = sale_items.groupby('ss_customer_sk')['ss_item_sk'].count()
gp1 = sale_items.groupby('ss_customer_sk')['i_class_id']
def id1(x): return (x == 1).sum()
def id2(x): return (x == 2).sum()
...
def id15(x): return (x == 15).sum()
customer_i_class = gp1.agg((id1, id2, ... id15))
customer_i_class['ss_item_count'] = count1
customer_i_class = customer_i_class[customer_i_class.ss_item_count > 5]
customer_i_class = customer_i_class.drop(['ss_item_count'], axis=1)
customer_i_class = customer_i_class.sort_values('ss_customer_sk')
return customer_i_class
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment