How to calculte class probabilities from RawFormulaVal in Multiclassification problems with Catboost
you get probabilities for the i-th test case by doing:
np.exp(preds_raw[i])/sum(np.exp(preds_raw[i])
Example:
ipdb> preds_raw = model.predict(test_pool, prediction_type='RawFormulaVal')
ipdb> preds_raw[0]
array([-1.05871157, -1.06226035, 0.04045621, 4.22304744, -1.05710864,
-1.08542309])
ipdb> preds_log[0]
array([-5.31659416, -5.32014294, -4.21742638, -0.03483515, -5.31499123,
-5.34330568])
ipdb> preds_proba[0]
array([0.00490945, 0.00489205, 0.01473652, 0.96576461, 0.00491732,
0.00478004])
ipdb> sum(preds_log[0])
-25.547295546220848
ipdb> sum(preds_raw[0])
2.9951596758337473e-12
ipdb> sum(preds_proba[0])
0.9999999999999999
ipdb> preds_raw[0]/sum(preds_Log[0])
*** NameError: name 'preds_Log' is not defined
ipdb> preds_raw[0]/sum(preds_log[0])
array([ 0.04144124, 0.04158015, -0.00158358, -0.16530311, 0.0413785 ,
0.04248681])
ipdb> preds_log[0]/sum(preds_log[0])
array([0.20810791, 0.20824682, 0.16508309, 0.00136356, 0.20804516,
0.20915348])
ipdb> 1/preds_log[0]-1
array([ -1.18809034, -1.18796487, -1.23711143, -29.70663945,
-1.18814706, -1.18715006])
ipdb> preds_raw[0]/(1-preds_raw[0])
array([-0.5142593 , -0.51509517, 0.04216192, -1.31026537, -0.5138808 ,
-0.520481 ])
ipdb> np.exp(preds_raw[0])
array([ 0.34690248, 0.34567358, 1.04128571, 68.24112877, 0.34745899,
0.33775885])
ipdb> np.exp(preds_raw[0])/sum(np.exp(preds_raw[0]))
array([0.00490945, 0.00489205, 0.01473652, 0.96576461, 0.00491732,
0.00478004])
ipdb> preds_proba[0]
array([0.00490945, 0.00489205, 0.01473652, 0.96576461, 0.00491732,
0.00478004])
ipdb> np.exp(preds_raw[0])/sum(np.exp(preds_raw[0])) == preds_proba[0]
array([ True, False, False, False, False, False])
ipdb> np.round(np.exp(preds_raw[0])/sum(np.exp(preds_raw[0])),3) == np.round(preds_proba[0], 3)
array([ True, True, True, True, True, True])
ipdb> np.round(np.exp(preds_raw[0])/sum(np.exp(preds_raw[0])),6) == np.round(preds_proba[0], 6)
array([ True, True, True, True, True, True])