Last active
February 20, 2024 19:44
-
-
Save slowkow/9041570 to your computer and use it in GitHub Desktop.
Create a Q-Q plot with ggplot2
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#' Create a quantile-quantile plot with ggplot2. | |
#' | |
#' Assumptions: | |
#' - Expected P values are uniformly distributed. | |
#' - Confidence intervals assume independence between tests. | |
#' We expect deviations past the confidence intervals if the tests are | |
#' not independent. | |
#' For example, in a genome-wide association study, the genotype at any | |
#' position is correlated to nearby positions. Tests of nearby genotypes | |
#' will result in similar test statistics. | |
#' | |
#' @param ps Vector of p-values. | |
#' @param ci Size of the confidence interval, 95% by default. | |
#' @return A ggplot2 plot. | |
#' @examples | |
#' library(ggplot2) | |
#' gg_qqplot(runif(1e2)) + theme_grey(base_size = 24) | |
gg_qqplot <- function(ps, ci = 0.95) { | |
n <- length(ps) | |
df <- data.frame( | |
observed = -log10(sort(ps)), | |
expected = -log10(ppoints(n)), | |
clower = -log10(qbeta(p = (1 - ci) / 2, shape1 = 1:n, shape2 = n:1)), | |
cupper = -log10(qbeta(p = (1 + ci) / 2, shape1 = 1:n, shape2 = n:1)) | |
) | |
log10Pe <- expression(paste("Expected -log"[10], plain(P))) | |
log10Po <- expression(paste("Observed -log"[10], plain(P))) | |
ggplot(df) + | |
geom_point(aes(expected, observed), shape = 1, size = 3) + | |
geom_abline(intercept = 0, slope = 1, alpha = 0.5) + | |
geom_line(aes(expected, cupper), linetype = 2) + | |
geom_line(aes(expected, clower), linetype = 2) + | |
xlab(log10Pe) + | |
ylab(log10Po) | |
} |
@holgerman Thanks for the comment! I updated the code with your suggestions.
What is going on in the last argument for the CI calculation N-1:N+1
? What is the order of events? Is it like this: (n-(1:n))+1
?
@royfrancis Sorry about that, I agree that this is a confusing expression.
> N
[1] 3
> N - 1:N
[1] 2 1 0
> N - 1:N + 1
[1] 3 2 1
> N:1
[1] 3 2 1
I think it can be safely rewritten as N:1
.
Thanks for sharing this code. Note, however, there is now a package called qqplotr that produces Q-Q and P-P plots with confidence bands using ggplot2. The package offers some additional options and is probably better suited to "production use". Still, your code is great for those learning to use R/ggplot2.
@sjvrensburg Thanks for the comment! I look forward to trying qqplotr.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thank you very much for sharing!
As noted by my colleague Philippe, better make the CI a two-sided CI as most people are used to this:
and to better account for cases if length(ps) is rather small
Best greetings,
Holger