Skip to content

Instantly share code, notes, and snippets.

@arose13
Last active June 11, 2019 19:27
Show Gist options
  • Save arose13/4bf4028fcab10ea70f6cf01b5c3f9123 to your computer and use it in GitHub Desktop.
Save arose13/4bf4028fcab10ea70f6cf01b5c3f9123 to your computer and use it in GitHub Desktop.
How to train a XGBoost in how I believe is the best way (on large data)
import xgboost as xgb
# Notice the large number of trees and the low learning rate.
# There are other important parameters like `subsample`, `min_child_weight` `colsample_bytree` but I'll leave that up
# to you and grid searching.
gbm = xgb.XGBRFRegressor(n_estimators=10000, learning_rate=0.01, n_jobs=-1)
# Training with automatic termination
gbm.fit(
x_train, y_train,
eval_set=[(x_val, y_val)],
eval_metric='rmse',
early_stopping_rounds=50,
verbose=True
)
# Predicting with the model that did the best on the test set
gbm.predict(x_test, ntree_limit=gbm.best_ntree_limit)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment