Skip to content

Instantly share code, notes, and snippets.

@cereniyim
Created April 29, 2020 13:45
Show Gist options
  • Save cereniyim/add5e70c6ac87e5353aaaf65804c6e9e to your computer and use it in GitHub Desktop.
Save cereniyim/add5e70c6ac87e5353aaaf65804c6e9e to your computer and use it in GitHub Desktop.
most frequent imputer function
def ImputeWithMostFrequent(train_df, test_df,
cols=["country", "province",
"region_1", "variety"]):
# function to impute country, province, region_1, variety
# columns with the most_frequent value of each feature
# most_frequent imputer is fitted on train dataset
# transformation done on the train and test set
train_df = pd.DataFrame(train_df[cols])
test_df = pd.DataFrame(test_df[cols])
most_frequent_imputer = SimpleImputer(strategy="most_frequent")
most_frequent_imputer.fit(train_df)
imputed_train_set = most_frequent_imputer.transform(train_df)
imputed_train_df = pd.DataFrame(imputed_train_set, columns=train_df.columns)
imputed_test_set = most_frequent_imputer.transform(test_df)
imputed_test_df = pd.DataFrame(imputed_test_set, columns=test_df.columns)
return imputed_train_df, imputed_test_df
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment