Skip to content

Instantly share code, notes, and snippets.

@CristianCantoro
Created June 20, 2021 02:05
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save CristianCantoro/f850adf7999dc6224f189017b6b6e433 to your computer and use it in GitHub Desktop.
Save CristianCantoro/f850adf7999dc6224f189017b6b6e433 to your computer and use it in GitHub Desktop.
Catboost: from RawFormulaVal to probabilities for Mullticlassification

How to calculte class probabilities from RawFormulaVal in Multiclassification problems with Catboost

you get probabilities for the i-th test case by doing:

np.exp(preds_raw[i])/sum(np.exp(preds_raw[i])

Example:

ipdb> preds_raw = model.predict(test_pool, prediction_type='RawFormulaVal')
ipdb> preds_raw[0]
array([-1.05871157, -1.06226035,  0.04045621,  4.22304744, -1.05710864,
       -1.08542309])
ipdb> preds_log[0]
array([-5.31659416, -5.32014294, -4.21742638, -0.03483515, -5.31499123,
       -5.34330568])
ipdb> preds_proba[0]
array([0.00490945, 0.00489205, 0.01473652, 0.96576461, 0.00491732,
       0.00478004])
ipdb> sum(preds_log[0])
-25.547295546220848
ipdb> sum(preds_raw[0])
2.9951596758337473e-12
ipdb> sum(preds_proba[0])
0.9999999999999999
ipdb> preds_raw[0]/sum(preds_Log[0])
*** NameError: name 'preds_Log' is not defined
ipdb> preds_raw[0]/sum(preds_log[0])
array([ 0.04144124,  0.04158015, -0.00158358, -0.16530311,  0.0413785 ,
        0.04248681])
ipdb> preds_log[0]/sum(preds_log[0])
array([0.20810791, 0.20824682, 0.16508309, 0.00136356, 0.20804516,
       0.20915348])
ipdb> 1/preds_log[0]-1
array([ -1.18809034,  -1.18796487,  -1.23711143, -29.70663945,
        -1.18814706,  -1.18715006])
ipdb> preds_raw[0]/(1-preds_raw[0])
array([-0.5142593 , -0.51509517,  0.04216192, -1.31026537, -0.5138808 ,
       -0.520481  ])
ipdb> np.exp(preds_raw[0])
array([ 0.34690248,  0.34567358,  1.04128571, 68.24112877,  0.34745899,
        0.33775885])
ipdb> np.exp(preds_raw[0])/sum(np.exp(preds_raw[0]))
array([0.00490945, 0.00489205, 0.01473652, 0.96576461, 0.00491732,
       0.00478004])
ipdb> preds_proba[0]
array([0.00490945, 0.00489205, 0.01473652, 0.96576461, 0.00491732,
       0.00478004])
ipdb> np.exp(preds_raw[0])/sum(np.exp(preds_raw[0])) == preds_proba[0]
array([ True, False, False, False, False, False])
ipdb> np.round(np.exp(preds_raw[0])/sum(np.exp(preds_raw[0])),3) == np.round(preds_proba[0], 3)
array([ True,  True,  True,  True,  True,  True])
ipdb> np.round(np.exp(preds_raw[0])/sum(np.exp(preds_raw[0])),6) == np.round(preds_proba[0], 6)
array([ True,  True,  True,  True,  True,  True])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment