Skip to content

Instantly share code, notes, and snippets.

@rkdgusrn1212
Created June 7, 2022 09:54
Show Gist options
  • Save rkdgusrn1212/5517cd13c9d207721db8817ec8067e74 to your computer and use it in GitHub Desktop.
Save rkdgusrn1212/5517cd13c9d207721db8817ec8067e74 to your computer and use it in GitHub Desktop.
Feature Engineering Skills
#categorical x numerical
df_new = pd.get_dummies(df.cat_feat, prefix="catxnum_feat").mul(df.num_feat, axis=0)
#count columns gt 0
df_new = pd.DataFrame();
df_new["count"] = df[["feat_1","feat_2",]].gt(0.0).sum(axis=1)
#categorical feature의 각 카테고리가 "_" 기준으로 3개의 feature로 나누어 질때
df_new = pd.DataFrame();
df_new[["cat_feat_1","cat_feat_2","cat_feat_3"]] = df.cat_feat.str.split("_", n=2, expand=True)
#두 feature에 대해 feature_1의 값에 따른 feature_2 그룹의 중간값을 feature로 만들때
df_new = pd.DataFrame()
df_new["m"] = df.groupby("feature_1")["feature_2"].transform("median")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment