Skip to content

Instantly share code, notes, and snippets.

@dyerrington
Last active September 5, 2020 00:48
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dyerrington/dc0d1899bdd3c03b5999f1522c5bdc16 to your computer and use it in GitHub Desktop.
Save dyerrington/dc0d1899bdd3c03b5999f1522c5bdc16 to your computer and use it in GitHub Desktop.
I can't tell you how many times I've plotted a roc curve for a multi-class problem from scratch. Too many times. I decided to make this gist to demonstrate how to implement a multi-class ROC (Receiver Operator Characteristic) plot in the most simple manner possible using Python.
## import any sklearn models and collect predictions / probabilities beforehand
import matplotlib.pyplot as plt
from cycler import cycler
## Line color config -- rather than create a structure with a finite color palette, use your own to cycle through a list.
default_cycler = (cycler(color=['r', 'g', 'b', 'y']) +
cycler(linestyle=['-', '--', ':', '-.']))
plt.rc('axes', prop_cycle = default_cycler)
## Set confusion metrics per class
fpr, tpr, thresh = {}, {}, {}
for index, class_name in enumerate(pipe_model.classes_):
fpr, tpr, threshold = roc_curve(y_encoded, y_hat_prob[:,index], pos_label=index)
plt.plot(fpr, tpr, label = f"Class - {class_name}")
plt.title('Multiclass ROC curve')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive rate')
plt.legend(loc='best')
@dyerrington
Copy link
Author

A slightly fancier version I use for plotting across in-sample (original + training), and out-of-sample:

## Line color config
default_cycler = (cycler(color=['r', 'g', 'b', 'y']) + 
                  cycler(linestyle=['-', '--', ':', '-.']))

plt.rc('axes', prop_cycle = default_cycler)

fig, ax = plt.subplots(nrows = 1, ncols = 3, figsize = (15, 5))
ax = ax.ravel()

## Plotting with different datasets
datasets = [
    dict(
        name                = "original",
        y                   = y_encoded,
        y_hat               = pipe_model.predict(df['sentence']),
        y_hat_probabilities = pipe_model.predict_proba(df['sentence'])
    ),
    dict(
        name                = "train",
        y                   = y_train,
        y_hat               = pipe_model.predict(X_train['sentence']),
        y_hat_probabilities = pipe_model.predict_proba(X_train['sentence'])
    ),
    dict(
        name                = "test",
        y                   = y_test,
        y_hat               = pipe_model.predict(X_test['sentence']),
        y_hat_probabilities = pipe_model.predict_proba(X_test['sentence'])
    ),
]

for index, data in enumerate(datasets):
    ## Set confusion metrics per class
    fpr, tpr, thresh = {}, {}, {}
    for class_index, class_name in enumerate(encoder.classes_):
        fpr, tpr, threshold = roc_curve(
            data['y'], data['y_hat_probabilities'][:,index], 
            pos_label = class_index
        )
        ax[index].plot(fpr, tpr, label = f"Class - {class_name}")
        ax[index].set_title(f"Multiclass ROC curve - {data['name']}")
        ax[index].set_xlabel('False Positive Rate')
        ax[index].set_ylabel('True Positive rate')
        ax[index].legend(loc='best')

image

@dyerrington
Copy link
Author

Also, encoder is an instance of sklearn.preprocessing.LabelEncoder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment