Last active
December 13, 2015 22:18
-
-
Save dvreed77/4983296 to your computer and use it in GitHub Desktop.
The effect of using uniform priors with unbalanced classes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
N1 = 3500; N2 = 1500; | |
data = [randn(N1, 1) ; 1 + randn(N2, 1)]; | |
labels = [zeros(N1, 1) ; ones(N2, 1)]; | |
for ii = 1:200 | |
cv(ii) = cvpartition(labels, 'Holdout', 0.2); | |
end | |
b_pcc = zeros(1, cv.NumTestSets); | |
for ii = 1:length(cv) | |
train_idx = cv(ii).training; | |
test_idx = cv(ii).test; | |
y_pred = classify(data(test_idx, :), data(train_idx, :), labels(train_idx)); | |
TP = sum(y_pred == 1 & labels(test_idx) == 1); | |
TN = sum(y_pred == 0 & labels(test_idx) == 0); | |
FP = sum(y_pred == 1 & labels(test_idx) == 0); | |
FN = sum(y_pred == 0 & labels(test_idx) == 1); | |
b_pcc(ii) = 0.5 * (TP/(TP + FN) + TN/(TN + FP)); | |
end | |
mean(b_pcc) | |
std(b_pcc) | |
b_pcc = zeros(1, cv.NumTestSets); | |
for ii = 1:length(cv) | |
train_idx = cv(ii).training; | |
test_idx = cv(ii).test; | |
y_pred = classify(data(test_idx, :), data(train_idx, :), labels(train_idx), 'linear', 'empirical'); | |
TP = sum(y_pred == 1 & labels(test_idx) == 1); | |
TN = sum(y_pred == 0 & labels(test_idx) == 0); | |
FP = sum(y_pred == 1 & labels(test_idx) == 0); | |
FN = sum(y_pred == 0 & labels(test_idx) == 1); | |
b_pcc(ii) = 0.5 * (TP/(TP + FN) + TN/(TN + FP)); | |
end | |
mean(b_pcc) | |
std(b_pcc) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
When performing this cross-validation problem, blindly using the
classify
function with unbalanced classes will give better performance than using empirical priors. You need to explicitly tellclassify
to use empirical priors by setting that flag.