Last active
December 21, 2023 14:49
-
-
Save jensdebruijn/13e8eeda85eb8644ac2a4ac4c3b8e732 to your computer and use it in GitHub Desktop.
Python implementation of the Nadeau and Bengio correction of dependent Student's t-test
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Python implementation of the Nadeau and Bengio correction of dependent Student's t-test | |
# using the equation stated in https://www.cs.waikato.ac.nz/~eibe/pubs/bouckaert_and_frank.pdf | |
from scipy.stats import t | |
from math import sqrt | |
from statistics import stdev | |
def corrected_dependent_ttest(data1, data2, n_training_samples, n_test_samples, alpha): | |
n = len(data1) | |
differences = [(data1[i]-data2[i]) for i in range(n)] | |
sd = stdev(differences) | |
divisor = 1 / n * sum(differences) | |
test_training_ratio = n_test_folds / n_training_folds | |
denominator = sqrt(1 / n + test_training_ratio) * sd | |
t_stat = divisor / denominator | |
# degrees of freedom | |
df = n - 1 | |
# calculate the critical value | |
cv = t.ppf(1.0 - alpha, df) | |
# calculate the p-value | |
p = (1.0 - t.cdf(abs(t_stat), df)) * 2.0 | |
# return everything | |
return t_stat, df, cv, p |
@Pibborn, yes you are correct that n_training_folds is n_1 and n_test_folds is n_2. The author of the gist just made a typo in the variable names, "test_training_ratio = n_test_folds / n_training_folds " should be corrected to "test_training_ratio = n_test_samples / n_training_samples"
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thanks for your answer! Just to be sure, do we agree that your
n_training_folds
isn_1
in the paper andn_test_folds
isn_2
?