Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?

Assume we have a binary classifier that gives the probability of being a positive sample in the [0.0, 1.0] range. Area Under the ROC Curve (AUC) quantitatively measures the accuracy of prediction made by such a classification model. Intuitively, what AUC does is to make sure if positive (i.e., label=1) samples in a validation set get higher probability of being positive than negative ones.

The AUC metric eventually gives a single value in [0.0, 1.0]. When we have five test samples sorted by their prediction results as follows, we can see that the classifier put higher probability to all positive samples, #1, #2, and #4, than the others. We define the best scenario as an AUC of 1.0.

Test sample # Probability of label=1 True label
1 0.8 1
2 0.7 1
4 0.6 1
3 0.5 0
5 0.2 0

By contrast, the following result is the worst case measured as an AUC of 0.0:

Test sample # Probability of label=1 True label
3 0.9 0
5 0.6 0
1 0.3 1
2 0.2 1
4 0.1 1

Meanwhile, an AUC of 0.5 is in between two of the above conditions; label=0 and 1 samples are alternatively listed in a sorted list of validation samples like:

Test sample # Probability of label=1 True label
1 0.7 1
3 0.5 0
2 0.4 1
5 0.2 0
4 0.1 1

Generally speaking, if AUC gives a value less than 0.7, you may need to tweak your predictive model to achieve higher value. On the other hand, an exceptionally high AUC value that is greater than 0.9 is not always preferable, because such model is likely to be over-fit to existing samples; too good evaluation result sometimes indicates a lack of generalizability of a prediction model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment