Last active
January 4, 2023 13:33
-
-
Save jpcbertoldo/da22764eed915ee1f8a1d79e32aa72f6 to your computer and use it in GitHub Desktop.
See this issue #17530 in scipy: https://github.com/scipy/scipy/issues/17530
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
update 2023-01-04
procedure
Obtain the p-value for the wilcoxon test using three methods:
scipy.stats.wilcoxon
(versionscipy==1.9.3
)method
) switch (from'exact'
to'approx'
) and the policy to deal with zero differences the one suggested in [1]:scipy.stats.wilcoxon
to obtain the test statistic and obtain the p-value withscipy.stats.permutation_test
.(c) gives the expected p-value, so (a) and (b) should get as close to it as possible.
We generate (standard) gaussianly random 1D vectors (as if they were the wilcoxon differences, i.e. the function called with
y=None
) of variable sizenum_samples
then forcenum_zeros
entries to 0 and forcenum_ties
entries to have the same number (but not zero).num_samples
is inrange(0, 10)
and we consider all possible combinations of(num_zeros, num_ties)
such thatnum_zeros + num_ties <= num_samples
with 50 random seeds for each combination.[1] J. Demšar, “Statistical Comparisons of Classifiers over Multiple Data Sets,” Journal of Machine Learning Research, vol. 7, no. 1, pp. 1–30, 2006. Url: http://jmlr.org/papers/v7/demsar06a.html
experiment variants
We test different configurations of the parameters
alternative
andzero_methods
:analysis
We look at the differences between the considered methods (a) and (b) relative to method (c). We mostly look at the absolute differences (lower is better) and the sign of the difference (positive is better than negative). The subtitle of the plots in the pdf above explain each visualization.
results and conclusions
Please refer to the pdf above where the figures can be visualized (better with zoom in a pdf viewer), checck the notebook for details.
4.1
,5.1
, and5.2
) the modification generally has larger abs error whenzero_method == 'pratt'
orzero_method == 'wilcox'
, but it improves it whenzero_method == 'zsplit'
4.2
) the modification generally has lower abs error for allnum_samples
, and the improvement is bigger with lowernum_samples
4.3
) same story for everynum_samples
, the modification:Sections
4.4
and4.5
show similar results in more detail, where one can see that improvements occur moslty when the number ofnum_zeros
is high (relative tonum_samples
).Section
5.3
confirms that the estimations are more often improved than degradated (positive "decrease of abs error"), and it's more relevant for lowernum_samples
).