Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save ogrisel/8502eb455cd38d41e92fee31863ffea7 to your computer and use it in GitHub Desktop.
Save ogrisel/8502eb455cd38d41e92fee31863ffea7 to your computer and use it in GitHub Desktop.
About the miscalibration of logistic regression models
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@glemaitre
Copy link

Something that I found out that the fact that logistic regression is not calibrated when tuning C on ROC-AUC is not necessarily deterministic. I think if you set the random_state=42 (of the train-test split) then you will get two exacts logistic regression models with the ROC-AUC and logloss. But the thing a bit weird is that the ROC-AUC is not equal to 1 on the training set so I'm a bit surprised. It might be due that we don't have enough repetition into the inner CV.

@glemaitre
Copy link

Another possibility would be that the solution are comparable but with a large variation (as shown on the validation curve) and we pick up another C just due to this.

@lorentzenchr
Copy link

Note that the model is severly miscalibrated despite the balance property. This property only informs us about marginal calibration, not about auto-calibration.

This is not the whole story. The balance property just holds for the design matrix of the logistic regression. If the design matrix is badly chosen, then the balance property is just (very) weak. For instance, if only random features without correlation to the target are chosen, the balance property reduces to the marginal (the conditioning drops out), which is weak.
On the other hand, for a "correct" design matrix, the balance property is stronger than auto-calibration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment