Last active
July 1, 2023 22:25
-
-
Save talperetz/6030f4e9997c249b09409dcf00e78f91 to your computer and use it in GitHub Desktop.
Can you explain where the 3-fold AUC data is coming from (how this is calculated)? I can figure out how to get AUC for validation data (for Catboost) but not for test data.
Hi. Could this code word for a dataset which has non numerical features like sex, city, profession ? And if yes, how ?
Sure thing. This dataset holds only categorical features.
You can see how I treat it here:
catboost.Pool(self.X_train, self.y_train, cat_features=self.categorical_columns_indices),
You can also take a look here at Catboost examples.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The CatboostOptimizer class is not going to work with the recent version of Catboost as is. First
test_scores = validation_scores.iloc[:, 2]
should be changed totest_scores = validation_scores.iloc[:, 1]
since the index 2 corresponds to the standard deviation of the metric, which we don't optimize for. Also, if your metric is not AUC (RMSE for example) then you need to replacebest_metric = test_scores.max()
withbest_metric = test_scores.min()
and return best_metric vs 1-best_metric since we're minimizing the RMSE and not the different between 1 and our AUC (which is almost always < 1)