Last active
August 15, 2021 16:26
-
-
Save millerh1/341f215e639faec766f9595162bfc0b1 to your computer and use it in GitHub Desktop.
DESeq2 + enrichr -- DGE for minimalists
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: "DGE + Pathway Analysis for Minimalists" | |
author: "Henry Miller" | |
date: "07/22/2021" | |
output: | |
html_document: | |
toc: true | |
toc_float: | |
collapsed: false | |
smooth_scroll: false | |
code_folding: hide | |
theme: journal | |
highlight: haddock | |
--- | |
```{r setup and library, warning=FALSE, message=FALSE, echo=FALSE, include=FALSE} | |
# Setup | |
knitr::opts_chunk$set(warning = FALSE, message = FALSE, | |
cache = FALSE, echo = TRUE) | |
``` | |
```{r Libraries} | |
# Libraries | |
library(dplyr) | |
library(tibble) | |
library(DESeq2) | |
library(EnsDb.Hsapiens.v86) | |
library(plyr) | |
library(airway) | |
data(airway) | |
``` | |
# Example auto-enrichr analysis | |
This gist shows a method for automatically running [enrichr](https://maayanlab.cloud/Enrichr/) on a list of gene vectors. | |
This is an exceedingly common use case for DGE analysis which yields over-expressed and under-expressed genes. It is also common for integrative analyses which explore multiple combinations of gene lists and can easily require > 16 pathway enrichment analyses. | |
Enrichr rapidly performs pathway enrichrment using a robust statistical approach. However, enrichr is usually performed manually using their web interface or with long-running analysis tasks via the `enrichR` R package ([link](https://cran.r-project.org/web/packages/enrichR/vignettes/enrichR.html)). These approaches are unsuitable for notebooks which can change and require many different enrichment analyses. Instead, it is advantageous to use the web API via the `httr` package to generate permalinks and then automatically include them in the RMarkdown notebook. Please see the following example for further guidance: | |
## Running DESeq2 | |
To begin, we use the example data from the `airway` R package [here](https://bioconductor.org/packages/release/data/experiment/html/airway.html). We perform differential gene expression analysis using `DESeq2` and wrangle the results with `dplyr`, `tibble`, and `plyr` to get a named list of over-expressed and under-expressed genes. | |
```{r DESeq2} | |
geneLst <- DESeqDataSet(airway, design = ~ dex) %>% # Sets up DESeq dataset | |
DESeq() %>% # Run analysis | |
results(contrast = c("dex", "trt", "untrt")) %>% # Collect results | |
as.data.frame() %>% # Convert to dataframe | |
rownames_to_column(var = "GENEID") %>% # Genes are now a column | |
dplyr::filter(padj < .05) %>% # Filter for significant results | |
mutate(result = case_when( # Labels over-expressed vs under-expressed genes | |
log2FoldChange > 0 ~ "Over-expressed", | |
TRUE ~ "Under-expressed" | |
)) %>% | |
inner_join( # Add in the gene symbols | |
AnnotationDbi::select(EnsDb.Hsapiens.v86, | |
keys=keys(EnsDb.Hsapiens.v86), | |
columns = "SYMBOL"), | |
by = "GENEID" | |
) %>% | |
group_by(result) %>% # Group by result column | |
{setNames(group_split(.), group_keys(.)[[1]])} %>% # Split tibble into list by group with names | |
llply(pull, var = SYMBOL) # Final conversion to named list of gene vectors | |
# Display number of genes in each group | |
llply(geneLst, length) | |
``` | |
## Pathway enrichment {.tabset} | |
We then perform pathway enrichrment automatically using the code below: | |
```{r childRMD, results='asis'} | |
resRmd <- llply(names(geneLst), function(groupNow) { | |
genesNow <- geneLst[[groupNow]] | |
response <- httr::POST( # Post request to enrichr based on https://maayanlab.cloud/Enrichr/help#api&q=1 | |
url = 'https://maayanlab.cloud/Enrichr/addList', | |
body = list( | |
'list' = paste0(genesNow, collapse = "\n"), | |
'description' = groupNow | |
) | |
) | |
response <- jsonlite::fromJSON(httr::content(response, as = "text")) # Collect response | |
permalink <- paste0("https://maayanlab.cloud/Enrichr/enrich?dataset=", # Create permalink | |
response$shortId[1]) | |
# See this for more guidance: https://bookdown.org/yihui/rmarkdown-cookbook/child-document.html | |
knitr::knit_child(text = c( # Text vector to be knitted into RMarkdown as a child | |
'### `r groupNow`', | |
'', | |
'Enrichr Link: <a href="`r permalink`" target="_blank">`r groupNow`</a>.', | |
'' | |
), | |
envir = environment(), # Current global environment will be passed into the RMarkdown child | |
quiet = TRUE) | |
}) | |
cat(unlist(resRmd), sep = '\n') | |
``` | |
# Conclusions | |
This gist provides an example of differential gene expression in R with pathway enrichment. I have shown you how `dplyr`, `tibble`, and `plyr` can be leveraged to streamline the process of running DESeq2 and how `httr` can be used to quickly perform pathway enrichrment via the enrichr web service API. | |
Ways to improve this approach: | |
1. Instead of using text, consider using a child Rmd template. See an example [here](https://bookdown.org/yihui/rmarkdown-cookbook/child-document.html). | |
2. Consider using the `enrichR` R package in order to make visualizations of the top pathways of interest. | |
3. Consider using the `DT` R package to make user-friendly HTML tables for showing DGE results. | |
4. Consider including visualizations, such as PCA, MA plots, heatmaps, and volcano plots as in [this tutorial](http://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html). | |
# Session info | |
```{r} | |
sessionInfo() | |
``` |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
See the finished product here: https://static-html-pages.s3.us-west-2.amazonaws.com/misc/auto-enrichr-links.html