Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Sampling from an Epanechnikov kernel density estimate in R.
#' Sample from a kernel density with the Epanechnikov kernel.
#'
#' @param n Number of observations to sample.
#' @param x The data from which the estimate is to be computed.
#' @param bw Desired bandwidth.
#' @return A numeric vector with n sampled data points from the kernel
#' density estimator.
#' @details The factor sqrt(5) in is needed since the standard deviation of
#' the kernel is 1/sqrt(5). This makes the normal kernel and the Epanchnikov
#' kernel comparable.
#' @references The quantile function of the Epanchnikov kernel is found at
#' https://stats.stackexchange.com/questions/6643/what-is-the-closed-form-solution-for-the-inverse-cdf-for-epanechnikov
rkde = function(n, x, bw) {
centers = sample(x, n, replace = TRUE)
qepanechnikov = function(p) {
2*sin(1/3*asin(2*p - 1))
}
qepanechnikov(runif(n))*bw*sqrt(5) + centers
}
## Take a data set and find a suitable bandwidth with density.
data = airquality$Wind
bw = density(data, kernel = "epanechnikov")$bw
## Check that it works.
set.seed(313)
n = 1000000
samples = rkde(n, data, bw)
hist(samples, breaks = 1000, freq = FALSE)
lines(density(data, kernel = "epanechnikov"), lwd = 2, col = "red")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment