Skip to content

Instantly share code, notes, and snippets.

@ylogx
Last active February 16, 2024 14:25
Show Gist options
  • Star 35 You must be signed in to star a gist
  • Fork 16 You must be signed in to fork a gist
  • Save ylogx/53fef94cc61d6a3e9b3eb900482f41e0 to your computer and use it in GitHub Desktop.
Save ylogx/53fef94cc61d6a3e9b3eb900482f41e0 to your computer and use it in GitHub Desktop.
XGBoost Incremental Learning
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@c3-varun
Copy link

Same issue on XGBoost 1.4.0. Has anyone figured this out yet?

@pjbhaumik
Copy link

Hi,
I have found the solution. Per xgboost documentation, the parameter 'update' should be 'updater'... this is a mistake in the notebook above. If you fix this, then you will see the right results.

model = xgb.train({
'learning_rate': 0.007,
'updater':'refresh',
'process_type': 'update',
'refresh_leaf': True,
#'reg_lambda': 3, # L2
'reg_alpha': 3, # L1
'silent': False,
}, dtrain=xgb.DMatrix(x_tr[start:start+batch_size], y_tr[start:start+batch_size]), xgb_model=model)

@marymlucas
Copy link

marymlucas commented Jul 14, 2023

Disregard, I figured it out. I was using handle_unknown='ignore' in OneHotEncoder, but one of the features has too few of a particular category, hence the mismatch.

Thank you for this gist. How can we implement this in a pipeline?

I am unable to test on the Boston dataset as it's been removed from sklearn, but on a different dataset I get a mismatch in number of columns. Even though I use the same pipeline the saved model seems to have one less feature than the new training data and I am unable to figure out why.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment