Instantly share code, notes, and snippets.

# jensdebruijn/corrected_dependent_ttest.py

Last active December 21, 2023 14:49
Show Gist options
• Save jensdebruijn/13e8eeda85eb8644ac2a4ac4c3b8e732 to your computer and use it in GitHub Desktop.
Python implementation of the Nadeau and Bengio correction of dependent Student's t-test
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
 # Python implementation of the Nadeau and Bengio correction of dependent Student's t-test # using the equation stated in https://www.cs.waikato.ac.nz/~eibe/pubs/bouckaert_and_frank.pdf from scipy.stats import t from math import sqrt from statistics import stdev def corrected_dependent_ttest(data1, data2, n_training_samples, n_test_samples, alpha): n = len(data1) differences = [(data1[i]-data2[i]) for i in range(n)] sd = stdev(differences) divisor = 1 / n * sum(differences) test_training_ratio = n_test_folds / n_training_folds denominator = sqrt(1 / n + test_training_ratio) * sd t_stat = divisor / denominator # degrees of freedom df = n - 1 # calculate the critical value cv = t.ppf(1.0 - alpha, df) # calculate the p-value p = (1.0 - t.cdf(abs(t_stat), df)) * 2.0 # return everything return t_stat, df, cv, p

### Pibborn commented Oct 1, 2021

Thanks for your answer! Just to be sure, do we agree that your `n_training_folds` is `n_1` in the paper and `n_test_folds` is `n_2`?

### winstonwzhang commented Feb 13, 2022

@Pibborn, yes you are correct that n_training_folds is n_1 and n_test_folds is n_2. The author of the gist just made a typo in the variable names, "test_training_ratio = n_test_folds / n_training_folds " should be corrected to "test_training_ratio = n_test_samples / n_training_samples"