Skip to content

Instantly share code, notes, and snippets.

@dvreed77
Last active December 13, 2015 22:18
Show Gist options
  • Save dvreed77/4983296 to your computer and use it in GitHub Desktop.
Save dvreed77/4983296 to your computer and use it in GitHub Desktop.
The effect of using uniform priors with unbalanced classes.
N1 = 3500; N2 = 1500;
data = [randn(N1, 1) ; 1 + randn(N2, 1)];
labels = [zeros(N1, 1) ; ones(N2, 1)];
for ii = 1:200
cv(ii) = cvpartition(labels, 'Holdout', 0.2);
end
b_pcc = zeros(1, cv.NumTestSets);
for ii = 1:length(cv)
train_idx = cv(ii).training;
test_idx = cv(ii).test;
y_pred = classify(data(test_idx, :), data(train_idx, :), labels(train_idx));
TP = sum(y_pred == 1 & labels(test_idx) == 1);
TN = sum(y_pred == 0 & labels(test_idx) == 0);
FP = sum(y_pred == 1 & labels(test_idx) == 0);
FN = sum(y_pred == 0 & labels(test_idx) == 1);
b_pcc(ii) = 0.5 * (TP/(TP + FN) + TN/(TN + FP));
end
mean(b_pcc)
std(b_pcc)
b_pcc = zeros(1, cv.NumTestSets);
for ii = 1:length(cv)
train_idx = cv(ii).training;
test_idx = cv(ii).test;
y_pred = classify(data(test_idx, :), data(train_idx, :), labels(train_idx), 'linear', 'empirical');
TP = sum(y_pred == 1 & labels(test_idx) == 1);
TN = sum(y_pred == 0 & labels(test_idx) == 0);
FP = sum(y_pred == 1 & labels(test_idx) == 0);
FN = sum(y_pred == 0 & labels(test_idx) == 1);
b_pcc(ii) = 0.5 * (TP/(TP + FN) + TN/(TN + FP));
end
mean(b_pcc)
std(b_pcc)
@dvreed77
Copy link
Author

When performing this cross-validation problem, blindly using the classify function with unbalanced classes will give better performance than using empirical priors. You need to explicitly tell classify to use empirical priors by setting that flag.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment