Skip to content

Instantly share code, notes, and snippets.

@tavareshugo
Last active January 17, 2022 20:08
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tavareshugo/3973461a7daf8a43e65e3566d5deed14 to your computer and use it in GitHub Desktop.
Save tavareshugo/3973461a7daf8a43e65e3566d5deed14 to your computer and use it in GitHub Desktop.
#' Tidying methods for DESeq2 DESeqTransform objects
#'
#' This reshapes a DESeq2 transform object (output from `rlog()` and
#' `varianceStabilizingTransform()` functions) into a tidy format.
#'
#' @param x DESeqTransform object
#' @param colData whether colData should be included in the tidied output
#' for those in the DESeqTransform object.
#' @param ... extra arguments (not used)
#'
#' @details \code{colDat=TRUE} adds covariates from colData to the data frame.
#'
#' @return the result is a data frame with the columns
#' \item{gene}{gene ID}
#' \item{sample}{sample ID}
#' \item{log2Count}{transformed normalized counts in this gene in this sample on a log2 scale}
#'
#' If \code{colData = TRUE}, it also merges this with the columns present
#' in \code{colData(x)}.
#'
#' @name DESeq2_tidiers
#'
#' @examples
#'
#' # From DESeq2 documentation
#'
#' if (require("DESeq2")) {
#' dds <- makeExampleDESeqDataSet(betaSD = 1)
#'
#' # vst transform
#' vst_norm <- varianceStabilizingTransformation(dds)
#'
#' # rlog transform
#' rlog_norm <- rlog(dds)
#'
#' tidy(rlog_norm)
#' tidy(vst_norm)
#'
#' # With design included
#' tidy(rlog_norm, colData=TRUE)
#' tidy(vst_norm, colData=TRUE)
#' }
#'
#' @S3method tidy DESeqDataSet
#' @export tidy.DESeqDataSet
tidy.DESeqTransform <- function(x, colData = FALSE){
if(!require(biobroom)){
stop("please install `biobroom` package")
}
# Tidy the normalised expression data within the object
expressions <- fix_data_frame(assay(x), newcol="gene")
ret <- expressions %>%
tidyr::gather(sample, log2Count, -gene) %>%
dplyr::mutate(sample=as.character(sample))
if (colData) {
cdat <- data.frame(SummarizedExperiment::colData(x), stringsAsFactors=FALSE)
rownames(cdat) <- colnames(x)
ret <- cbind(
ret[, c('gene', 'sample')],
cdat[ret$sample,,drop=FALSE],
log2Count=ret$log2Count) %>%
biobroom:::unrowname()
}
biobroom:::finish(ret)
}
@amcdavid
Copy link

Nice! I would like to add it to a package containing various other convenience functions for RNAseq analyses. Are you licensing this code?

@tavareshugo
Copy link
Author

Glad this seems useful. :)

I hadn't really thought about licensing, since I've written this as a suggestion to add to the {biobroom} package (issue #20), but never got a reply and didn't follow up on it.
But I guess anything permissible, like MIT is fine. Basically, feel free to use/modify the code, and if you use it as is just acknowledge its source (e.g. with a link to this gist in a comment or something like that)

I have other "tidiers" for SummarizedExperiment / SingleCellExperiment objects, if those are handy (not pushed anywhere, I have to dig them out from other projects). Do you have a project repo already?

@amcdavid
Copy link

Hi @tavareshugo. Thanks for responding! It looks like the biobroom repo hasn't been touched for a few years, and currently has WARNINGS on its bioconductor build. I personally use it enough that I'd be willing to adopt maintenance of it if the Storeylab isn't able to any more, and it would make lots of sense to then merge this and the other tidiers into biobroom. I might drop the maintainers an email to see if they are still interested in maintaining.

FWIW I was using this tidy method in a package that has some templates for RNAseq analysis. It's mostly for internal use, those the repo is public here: https://github.com/amcdavid/GeneseeBulk

On a related note, are you familiar https://stemangiola.github.io/tidySingleCellExperiment/ ? It's an interesting approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment