The terrorist detection task is an imbalanced classification problem: we have two classes we need to identify—terrorists and not terrorists—with one category (non-terrorists) representing the overwhelming majority of the data points.
Another imbalanced classification
problem occurs in disease detection when the rate of the disease in the public is very low.
In both these cases, the positive class—disease or terrorist—greatly outnumbers the negative class. These types of problems are examples of the fairly common case in data science when accuracy is not a good measure
for assessing model performance.
Intuitively, we know that proclaiming all data points as negative (not a terrorist) in the terrorist detection problem isn’t helpful, and, instead, we should focus on identifying the positive cases. The metric our intuition tells us we should maximize is known in statistics as recall
, or the ability of a model to find all the relevant cases within a data set
.
While recall expresses the ability to find all relevant instances of a class in a data set, precision expresses the proportion of the data points our model says existed in the relevant class that were indeed relevant
.
One of the main visualization technique for showing the performance of a classification model is the Receiver Operating Characteristic (ROC) curve
. The idea is simple: the ROC curve shows how the recall vs. precision relationship changes as we vary the threshold for identifying a positive data point in our model. The threshold represents the value above which we consider a data point in the positive class.
We can quantify a model’s ROC curve
by calculating the total Area Under the Curve (AUC), a metric that falls between zero and one with a higher number indicating better classification performance
. In the graph above, the AUC for the blue curve will be greater than that for the red curve, meaning the blue model is better at achieving a blend of precision and recall. A random classifier (the black line) achieves an AUC of 0.5.