Skip to content

Instantly share code, notes, and snippets.

@jensdebruijn
Last active December 21, 2023 14:49
Show Gist options
  • Save jensdebruijn/13e8eeda85eb8644ac2a4ac4c3b8e732 to your computer and use it in GitHub Desktop.
Save jensdebruijn/13e8eeda85eb8644ac2a4ac4c3b8e732 to your computer and use it in GitHub Desktop.
Python implementation of the Nadeau and Bengio correction of dependent Student's t-test
# Python implementation of the Nadeau and Bengio correction of dependent Student's t-test
# using the equation stated in https://www.cs.waikato.ac.nz/~eibe/pubs/bouckaert_and_frank.pdf
from scipy.stats import t
from math import sqrt
from statistics import stdev
def corrected_dependent_ttest(data1, data2, n_training_samples, n_test_samples, alpha):
n = len(data1)
differences = [(data1[i]-data2[i]) for i in range(n)]
sd = stdev(differences)
divisor = 1 / n * sum(differences)
test_training_ratio = n_test_folds / n_training_folds
denominator = sqrt(1 / n + test_training_ratio) * sd
t_stat = divisor / denominator
# degrees of freedom
df = n - 1
# calculate the critical value
cv = t.ppf(1.0 - alpha, df)
# calculate the p-value
p = (1.0 - t.cdf(abs(t_stat), df)) * 2.0
# return everything
return t_stat, df, cv, p
@Pibborn
Copy link

Pibborn commented Oct 1, 2021

Thanks for your answer! Just to be sure, do we agree that your n_training_folds is n_1 in the paper and n_test_folds is n_2?

@winstonwzhang
Copy link

@Pibborn, yes you are correct that n_training_folds is n_1 and n_test_folds is n_2. The author of the gist just made a typo in the variable names, "test_training_ratio = n_test_folds / n_training_folds " should be corrected to "test_training_ratio = n_test_samples / n_training_samples"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment