Skip to content

Instantly share code, notes, and snippets.

@this-is-richard
Created April 27, 2019 10:25
Show Gist options
  • Save this-is-richard/4e3e7b5b32b65d32bcee54366da3cf14 to your computer and use it in GitHub Desktop.
Save this-is-richard/4e3e7b5b32b65d32bcee54366da3cf14 to your computer and use it in GitHub Desktop.
# timeseries data
df[col] = df[col].interpolate
# independent feats, numerical label
df.dropna(how='any', inplace=True)
# categorical feats
groupby_label = df.groupby(label)
print(groupby_label.mean())
print(df.mean())
if the_means_are_label_dependent:
# use group level average to fill na
df[feat1] = groupby_label[feat1].transform(lambda x: x.fillna(x.mean()))
df[feat2] = groupby_label[feat2].transform(lambda x: x.fillna(x.mean()))
df[feat3] = groupby_label[feat3].transform(lambda x: x.fillna(x.mean()))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment