Skip to content

Instantly share code, notes, and snippets.

@sh16ma
Last active January 19, 2022 10:13
Show Gist options
  • Save sh16ma/1e2a3c14ae824ae2a22f67370bc809d8 to your computer and use it in GitHub Desktop.
Save sh16ma/1e2a3c14ae824ae2a22f67370bc809d8 to your computer and use it in GitHub Desktop.
#🐍 #Python #EDA #NaN #ランキング形式
def nan_rank(df, usabilty=20):
"""
df : データフレーム
usabilty : データの割合に応じての足切りライン
"""
nan = df.isnull().sum().reset_index()
nan.columns = ["name", "count"]
nan["ratio"] = (nan["count"] / df.shape[0])*100
nan["usabilty"] = np.where(nan["ratio"] > usabilty, "Discard", "Keep")
nan = nan[nan["count"] > 0].sort_values(by="ratio")
plt.figure(figsize=(15, 6))
sns.barplot(x=nan["name"], y=nan["ratio"])
plt.xticks(rotation=90) #90°傾け
plt.title("Feature containing NaN.")
plt.show()
return nan
@sh16ma
Copy link
Author

sh16ma commented Feb 1, 2021

スクリーンショット 2021-02-01 17 59 08

スクリーンショット 2021-02-01 18 00 09

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment