Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save jpcbertoldo/da22764eed915ee1f8a1d79e32aa72f6 to your computer and use it in GitHub Desktop.
Save jpcbertoldo/da22764eed915ee1f8a1d79e32aa72f6 to your computer and use it in GitHub Desktop.
See this issue #17530 in scipy: https://github.com/scipy/scipy/issues/17530
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@jpcbertoldo
Copy link
Author

jpcbertoldo commented Jan 4, 2023

update 2023-01-04

procedure

Obtain the p-value for the wilcoxon test using three methods:

  • (a) original: use the method in scipy.stats.wilcoxon (version scipy==1.9.3)
  • (b) modified: avoid the mode (or method) switch (from 'exact' to 'approx') and the policy to deal with zero differences the one suggested in [1]:
    • do nothing if the number of zeros is even
    • discard one if the number of zeros is odd
  • (c) permutation: use scipy.stats.wilcoxon to obtain the test statistic and obtain the p-value with scipy.stats.permutation_test.

(c) gives the expected p-value, so (a) and (b) should get as close to it as possible.

We generate (standard) gaussianly random 1D vectors (as if they were the wilcoxon differences, i.e. the function called with y=None) of variable size num_samples then force num_zeros entries to 0 and force num_ties entries to have the same number (but not zero). num_samples is in range(0, 10) and we consider all possible combinations of (num_zeros, num_ties) such that num_zeros + num_ties <= num_samples with 50 random seeds for each combination.

[1] J. Demšar, “Statistical Comparisons of Classifiers over Multiple Data Sets,” Journal of Machine Learning Research, vol. 7, no. 1, pp. 1–30, 2006. Url: http://jmlr.org/papers/v7/demsar06a.html

experiment variants

We test different configurations of the parameters alternative and zero_methods:

ALTERNATIVES = ['two-sided', 'greater', 'less']
ZERO_METHODS = ["wilcox", "pratt", "zsplit"]

analysis

We look at the differences between the considered methods (a) and (b) relative to method (c). We mostly look at the absolute differences (lower is better) and the sign of the difference (positive is better than negative). The subtitle of the plots in the pdf above explain each visualization.

results and conclusions

Please refer to the pdf above where the figures can be visualized (better with zoom in a pdf viewer), checck the notebook for details.

  1. (see sections 4.1, 5.1, and 5.2) the modification generally has larger abs error when zero_method == 'pratt' or zero_method == 'wilcox', but it improves it when zero_method == 'zsplit'
  2. (see section 4.2) the modification generally has lower abs error for all num_samples, and the improvement is bigger with lower num_samples
  3. (see section 4.3) same story for every num_samples, the modification:
    1. has more exact p-values (sign of diff is 0)
    2. avoids more false positives (positive diff sign is better because the estimated p-value is larger than the permutation)

Sections 4.4 and 4.5 show similar results in more detail, where one can see that improvements occur moslty when the number of num_zeros is high (relative to num_samples).

Section 5.3 confirms that the estimations are more often improved than degradated (positive "decrease of abs error"), and it's more relevant for lower num_samples).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment