Thom Benjamin Volker
In this notebook, we explore properties of divergence-based two-sample
testing, using the methods employed in the densityratio package
(Volker et al., 2023). We consider three settings: two groups are drawn
from the same distribution, two groups are drawn from distributions with
different means, but with the same variance-covariance matrix, and two
groups are drawn from distributions with the same means but different
covariances. Subsequently, we compare the performance of the
divergence-based estimators and evaluate type I and type II error rates
under the different scenarios. Additionally, we compare the performance
of the divergence-based estimators with the performance of the
Hotelling’s
In setting one, we generate two groups of
library(densityratio)
set.seed(123)
nsim <- 500
P <- 5
N <- 200
M1 <- rep(0, P)
M2 <- rep(0.1, P)
V1 <- diag(P)
V3 <- 0.3 + 0.7 * diag(P)
gen_data <- function(N, P, Mnu, Mde, Vnu, Vde) {
munu <- matrix(rep(Mnu, each = N), N)
mude <- matrix(rep(Mde, each = N), N)
nu <- (rnorm(N*P) |> matrix(N)) %*% chol(Vnu) + munu
de <- (rnorm(N*P) |> matrix(N)) %*% chol(Vde) + mude
list(nu = nu, de = de)
}Subsequently, we write a function that tests whether the groups of
samples are drawn from the same underlying distribution using different
tests, and report the
two_sample_test <- function(nu, de) {
hotelling <- DescTools::HotellingsT2Test(nu, de)$p.value[1]
kolmogorov <- fasano.franceschini.test::fasano.franceschini.test(nu, de, verbose = FALSE)$p.value
ulsif_est <- summary(
ulsif(nu, de, nsigma = 10, nlambda = 5, ncenters = 100, progressbar = FALSE),
test = TRUE,
progressbar = FALSE)$p_value
kliep_est <- summary(
kliep(nu, de, nsigma = 10, ncenters = 100, progressbar = FALSE),
test = TRUE,
progressbar = FALSE)$p_value
kmm_est <- summary(
kmm(nu, de, nsigma = 10, ncenters = 100, progressbar = FALSE),
test = TRUE,
progressbar = FALSE)$p_value
spectral_est <- summary(
spectral(nu, de, nsigma = 10, ncenters = 100, progressbar = FALSE),
test = TRUE,
progressbar = FALSE)$p_value
naive_est <- summary(naive(nu, de), test = TRUE, progressbar = FALSE)$p_value
c(
hotelling = hotelling,
kolmogorov = kolmogorov,
ulsif = ulsif_est,
kliep = kliep_est,
kmm = kmm_est,
spectral = spectral_est,
naive = naive_est
)
}We first generate two groups of samples from the same standard normal
distribution, to evaluate type I error rates of each method. We repeat
this procedure 500 times, and report the proportion of
cl <- parallel::makeCluster(18)
invisible(parallel::clusterCall(cl, fun = \() {
library(densityratio)
pbapply::pboptions(type = "none")
}))
parallel::clusterExport(cl, c("N", "P", "M1", "V1", "gen_data", "two_sample_test"))
out1 <- pbapply::pbreplicate(
nsim, {
dat <- gen_data(N, P, M1, M1, V1, V1)
two_sample_test(dat$nu, dat$de)
},
cl = cl
)
parallel::stopCluster(cl)
rowMeans(out1 < 0.05) hotelling kolmogorov.p-value ulsif kliep
0.040 0.050 0.052 0.058
kmm spectral naive
0.056 0.052 0.042
We can observe that, while there is some variation in the type I error
rates, all seem reasonably close to the nominal
We then shift to the second setting, where we generate two groups of samples from normal distributions with different means.
cl <- parallel::makeCluster(18)
invisible(parallel::clusterCall(cl, fun = \() {
library(densityratio)
pbapply::pboptions(type = "none")
}))
parallel::clusterExport(cl, c("N", "P", "M1", "M2", "V1", "gen_data", "two_sample_test"))
out2 <- pbapply::pbreplicate(
nsim, {
dat <- gen_data(N, P, M1, M2, V1, V1)
two_sample_test(dat$nu, dat$de)
},
cl = cl
)
parallel::stopCluster(cl)
rowMeans(out2 < 0.05) hotelling kolmogorov.p-value ulsif kliep
0.368 0.216 0.238 0.180
kmm spectral naive
0.142 0.148 0.132
Here, we notice that Hotelling’s
cl <- parallel::makeCluster(18)
invisible(parallel::clusterCall(cl, fun = \() {
library(densityratio)
pbapply::pboptions(type = "none")
}))
parallel::clusterExport(cl, c("N", "P", "M1", "V1", "V3", "gen_data", "two_sample_test"))
out3 <- pbapply::pbreplicate(
nsim, {
dat <- gen_data(N, P, M1, M1, V1, V3)
two_sample_test(dat$nu, dat$de)
},
cl = cl
)
parallel::stopCluster(cl)
rowMeans(out3 < 0.05) hotelling kolmogorov.p-value ulsif kliep
0.070 0.750 0.550 0.950
kmm spectral naive
1.000 0.955 0.540
Finally, in the regime where the two groups have different covariance
matrices, we see that the performance of Hotelling’s kliep, kmm and
spectral) really excel, and even outperform the Fasano-Franceschini
test. Hence, the divergence-based tests can be very useful in the case
where two groups are expected to differ in higher-order moments.
Volker, T. B. (2025). Divergence-based testing using density ratio estimation techniques. https://gist.github.com/thomvolker/58197e535ec458752bccbb5b611046ce