Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@wrwr
Last active October 13, 2021 21:37
Show Gist options
  • Star 7 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save wrwr/3f6b66bf4ee01bf48be965f60d14454d to your computer and use it in GitHub Desktop.
Save wrwr/3f6b66bf4ee01bf48be965f60d14454d to your computer and use it in GitHub Desktop.
XGBoost hyperparameter search using scikit-learn RandomizedSearchCV
import time
import xgboost as xgb
from sklearn.model_selection import RandomizedSearchCV
x_train, y_train, x_valid, y_valid, x_test, y_test = # load datasets
clf = xgb.XGBClassifier()
param_grid = {
'silent': [False],
'max_depth': [6, 10, 15, 20],
'learning_rate': [0.001, 0.01, 0.1, 0.2, 0,3],
'subsample': [0.5, 0.6, 0.7, 0.8, 0.9, 1.0],
'colsample_bytree': [0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0],
'colsample_bylevel': [0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0],
'min_child_weight': [0.5, 1.0, 3.0, 5.0, 7.0, 10.0],
'gamma': [0, 0.25, 0.5, 1.0],
'reg_lambda': [0.1, 1.0, 5.0, 10.0, 50.0, 100.0],
'n_estimators': [100]}
fit_params = {'eval_metric': 'mlogloss',
'early_stopping_rounds': 10,
'eval_set': [(x_valid, y_valid)]}
rs_clf = RandomizedSearchCV(clf, param_grid, n_iter=20,
n_jobs=1, verbose=2, cv=2,
fit_params=fit_params,
scoring='neg_log_loss', refit=False, random_state=42)
print("Randomized search..")
search_time_start = time.time()
rs_clf.fit(x_train, y_train)
print("Randomized search time:", time.time() - search_time_start)
best_score = rs_clf.best_score_
best_params = rs_clf.best_params_
print("Best score: {}".format(best_score))
print("Best params: ")
for param_name in sorted(best_params.keys()):
print('%s: %r' % (param_name, best_params[param_name]))
@ivanlen
Copy link

ivanlen commented Feb 27, 2019

I think that you have to setrefit=True in order to be able to extract best_score = rs_clf.best_score_

@rbarman
Copy link

rbarman commented Feb 28, 2019

According to docs, fit_params has been depreciated.

@mdgraboski
Copy link

  • x_test and y_test are declared but not used. Where are we supposed to use them?
  • RandomizedSearchCV sets cv to 2. What does that mean? We're doing k-fold validation with 2 splits? Or does the xgboost classifier ignore that, and use (x_valid, y_valid) instead, no matter what value you supply to cv?

Copy link

ghost commented Apr 17, 2021

thank you for sharing. i just want to highlight statement from scikit-learn docs related to continuous parameter (https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html) => "It is highly recommended to use continuous distributions for continuous parameters."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment