Last active
February 16, 2024 14:25
-
-
Save ylogx/53fef94cc61d6a3e9b3eb900482f41e0 to your computer and use it in GitHub Desktop.
XGBoost Incremental Learning
Disregard, I figured it out. I was using handle_unknown='ignore' in OneHotEncoder, but one of the features has too few of a particular category, hence the mismatch.
Thank you for this gist. How can we implement this in a pipeline?
I am unable to test on the Boston dataset as it's been removed from sklearn, but on a different dataset I get a mismatch in number of columns. Even though I use the same pipeline the saved model seems to have one less feature than the new training data and I am unable to figure out why.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi,
I have found the solution. Per xgboost documentation, the parameter 'update' should be 'updater'... this is a mistake in the notebook above. If you fix this, then you will see the right results.
model = xgb.train({
'learning_rate': 0.007,
'updater':'refresh',
'process_type': 'update',
'refresh_leaf': True,
#'reg_lambda': 3, # L2
'reg_alpha': 3, # L1
'silent': False,
}, dtrain=xgb.DMatrix(x_tr[start:start+batch_size], y_tr[start:start+batch_size]), xgb_model=model)