Skip to content

Instantly share code, notes, and snippets.

@slowkow
Last active February 20, 2024 19:44
Show Gist options
  • Star 10 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save slowkow/9041570 to your computer and use it in GitHub Desktop.
Save slowkow/9041570 to your computer and use it in GitHub Desktop.
Create a Q-Q plot with ggplot2
#' Create a quantile-quantile plot with ggplot2.
#'
#' Assumptions:
#' - Expected P values are uniformly distributed.
#' - Confidence intervals assume independence between tests.
#' We expect deviations past the confidence intervals if the tests are
#' not independent.
#' For example, in a genome-wide association study, the genotype at any
#' position is correlated to nearby positions. Tests of nearby genotypes
#' will result in similar test statistics.
#'
#' @param ps Vector of p-values.
#' @param ci Size of the confidence interval, 95% by default.
#' @return A ggplot2 plot.
#' @examples
#' library(ggplot2)
#' gg_qqplot(runif(1e2)) + theme_grey(base_size = 24)
gg_qqplot <- function(ps, ci = 0.95) {
n <- length(ps)
df <- data.frame(
observed = -log10(sort(ps)),
expected = -log10(ppoints(n)),
clower = -log10(qbeta(p = (1 - ci) / 2, shape1 = 1:n, shape2 = n:1)),
cupper = -log10(qbeta(p = (1 + ci) / 2, shape1 = 1:n, shape2 = n:1))
)
log10Pe <- expression(paste("Expected -log"[10], plain(P)))
log10Po <- expression(paste("Observed -log"[10], plain(P)))
ggplot(df) +
geom_point(aes(expected, observed), shape = 1, size = 3) +
geom_abline(intercept = 0, slope = 1, alpha = 0.5) +
geom_line(aes(expected, cupper), linetype = 2) +
geom_line(aes(expected, clower), linetype = 2) +
xlab(log10Pe) +
ylab(log10Po)
}
@holgerman
Copy link

Thank you very much for sharing!
As noted by my colleague Philippe, better make the CI a two-sided CI as most people are used to this:

clower   = -log10(qbeta((1 - ci)/2,     1:N, N - 1:N + 1)),
cupper   = -log10(qbeta((1 + ci)/2, 1:N, N - 1:N + 1))

and to better account for cases if length(ps) is rather small

expected = -log10(ppoints(N),

Best greetings,
Holger

@slowkow
Copy link
Author

slowkow commented Nov 22, 2017

@holgerman Thanks for the comment! I updated the code with your suggestions.

@royfrancis
Copy link

What is going on in the last argument for the CI calculation N-1:N+1 ? What is the order of events? Is it like this: (n-(1:n))+1?

@slowkow
Copy link
Author

slowkow commented Feb 5, 2018

@royfrancis Sorry about that, I agree that this is a confusing expression.

> N
[1] 3
> N - 1:N
[1] 2 1 0
> N - 1:N + 1
[1] 3 2 1
> N:1
[1] 3 2 1

I think it can be safely rewritten as N:1.

@sjvrensburg
Copy link

Thanks for sharing this code. Note, however, there is now a package called qqplotr that produces Q-Q and P-P plots with confidence bands using ggplot2. The package offers some additional options and is probably better suited to "production use". Still, your code is great for those learning to use R/ggplot2.

@slowkow
Copy link
Author

slowkow commented Jan 9, 2019

@sjvrensburg Thanks for the comment! I look forward to trying qqplotr.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment