trestletech/README.md

## README.md

      
    Raw
  

              README.md
            
          
    Bioconductor Shiny Example #3

Simple app showing the relationship between a gene's expression and survival.
This is an example Shiny app featuring some basic analysis of Ovarian Cancer gene expression data collected on the Affymetrix U133A platform. We filter the available genes and samples down to a much smaller matrix to make reproducibility simpler for a broader audience. The R code involved in sampling the data is available in this Gist as an R-Markdown file, and the sampled data are available in this Gist as Rds files.
To run the application, install shiny (install.packages("shiny")) then run the following command:
library(shiny)
runGist(5929598)

The relevant citation for the original data is available below:

Benjamin Frederick Ganzfried, Markus Riester, Benjamin Haibe-Kains, Thomas
Risch, Svitlana Tyekucheva, Ina Jazic, Xin Victoria Wang, Mahnaz Ahmadifar,
Michael Birrer, Giovanni Parmigiani, Curtis Huttenhower, Levi Waldron.
curatedOvarianData: Clinically Annotated Data for the Ovarian Cancer
Transcriptome, Database 2013: bat013 doi:10.1093/database/bat013 published
online April 2, 2013.


## normalExpr.Rds

      
    Raw
  

              normalExpr.Rds
            
          
            View raw
        
    
## server.R
library(shiny)
library(Biobase)
library(survival)

# Load in the sampled matrices we've generated ahead of time.
tumor <- readRDS("tumorExpr.Rds")
# Distill the expression data away from the ExpressionSet object.
tumorExp <- exprs(tumor)

shinyServer(function(input, output, session) {

  # Need a wrapper around the densityClick input so we can manage whether or
  # not the click occured on the current Gene. If it occured on a previous
  # gene, we'll want to mark that click as 'stale' so we don't try to use it
  # later.
  currentClick <- list(click=NULL, stale=FALSE)
  handleClick <- observe({
    if (!is.null(input$densityClick) && !is.null(input$densityClick$x)){
      currentClick$click <<- input$densityClick
      currentClick$stale <<- FALSE
    }
  }, priority=100)

  getCutoff <- reactive({
    # We need this function to subscribe to a couple of dependencies. Without
    # explicitly providing these two at the top, this function may return a
    # cached value without realizing that a new click has occured, or a new
    # gene has been loaded.
    input$densityClick
    geneExp()

    # See if there's been a click since the last gene change.
    if (!is.null(currentClick$click) && !currentClick$stale){
      return(currentClick$click$x)
    }

    return (mean(geneExp()))
  })

  #' Extract the relevant tumor expression values.
  geneExp <- reactive({
    geneExp <- tumorExp[rownames(tumorExp) == input$gene, ]

    currentClick$stale <<- TRUE

    geneExp
  })

  #' Render a plot to show the distribution of the gene's expression
  output$densityPlot <- renderPlot({
    # Plot to density plot
    tumorGene <- geneExp()
    plot(density(tumorGene), main="Distribution", xlab="")

    # Add a vertical line to show where the current cutoff is.
    abline(v=getCutoff(), col=4)

    # Draw a line where they're hovering
    if (!is.null(input$densityHover) && !is.null(input$densityHover$y)){
      abline(v=input$densityHover$x, col=5)
    }
  }, bg="transparent")

  #' A reactive survival formula
  survivalFml <- reactive({
    tumorGene <- geneExp()

    # Create the groups based on which samples are above/below the cutoff
    expressionGrp <- as.integer(tumorGene < getCutoff())

    # Make sure there's more than one group!
    if (length(unique(expressionGrp)) < 2){
      stop("You must specify a cutoff that places at least one sample in each group!")
    }

    # Create the survival object
    surv <- with(pData(tumor),
                 Surv(days_to_death, recurrence_status=="recurrence"))
    return(surv ~ expressionGrp)
  })

  #' Print out some information about the fit
  output$info <- renderPrint({
    surv <- survivalFml()
    sDiff <- survdiff(surv)

    # Calculate the p value
    # Extracted from the print.survdiff function in the survival package
    df <- (sum(1 * (sDiff$exp > 0))) - 1
    pv <- format(signif(1 - pchisq(sDiff$chisq, df), 2))

    cat ("p-value: ", pv, "\n")
  })

  #' Create a Kaplan Meier plot
  output$kmPlot <- renderPlot({
    surv <- survivalFml()
    plot(survfit(surv), col=1:2, xlab="Survival Time (Days)",
         ylab="% Survival")
    legend(10,.4,c("Low Expr", "High Expr"), lty=1, col=1:2)
  })
})

## setData.Rmd
Extract and Store Some Data
========================================================

We'll first want to install the `curatedOvarianData` package from Bioconductor. Note that this package is a few hundred MB.

```{r, cache=TRUE, results='hide', message=FALSE}
source("http://bioconductor.org/biocLite.R")
biocLite("curatedOvarianData")
```

Now we'll load the package and load in one of the datasets.

```{r, results='hide', message=FALSE}
library(curatedOvarianData)
data(TCGA_eset)
```

Let's write out the clinical data so we have it for later. We'll trim out the "uncurated" metadata column to save some space and make the printing a bit easier.

```{r}
clinical <- phenoData(TCGA_eset)
pData(clinical) <- pData(clinical)[,colnames(pData(clinical))!="uncurated_author_metadata"]
```

We won't want to store all of the tumor samples available in the dataset, so let's grab 50 tumor samples at random and keep all of the normal samples.

```{r}
# Get all normal phenotypic data
normalClinical <- clinical
pData(normalClinical) <- pData(clinical)[pData(clinical)$sample_type == "adjacentnormal",]

# Sample 20 samples from the tumor phenotypic data
tumorIndices <- sample(which(pData(clinical)$sample_type == "tumor"), 50)
tumorClinical <- clinical
pData(tumorClinical) <- pData(clinical)[tumorIndices,]
```

Filtering
---------

For the sake of demonstration, we'll want to keep the dataset fairly small. So let's filter the data to the few dozen samples we selected above and only 8 genes.

```{r}
# Provide the HGNC symbols of 8 genes
geneList <- c("EGFR", "KLF6", "FOXO1", "KRAS", "JAK2", "BRCA1", "BRCA2", "PPM1D")

# Extract the relevant probeset to gene mappings
featureList <- fData(TCGA_eset)
featureList <- featureList[featureList$gene %in% geneList, ]
featureList <- featureList[match(geneList, rownames(featureList)), ] #order
featureList <- AnnotatedDataFrame(featureList)

# Filter the tumor expression data to only relevant genes and samples
tumor <- exprs(TCGA_eset[,tumorIndices])
tumor <- tumor[match(geneList, rownames(tumor)),]
tumor <- ExpressionSet(tumor, tumorClinical, featureList, experimentData(TCGA_eset))
saveRDS(tumor, "tumorExpr.Rds")
```

And grab all of the normal tissue samples:

```{r}
# Filter the normal expression data to only relevant genes and samples
normal <- exprs(TCGA_eset[,TCGA_eset@phenoData@data$sample_type == "adjacentnormal"])
normal <- normal[match(geneList, rownames(normal)),]
normal <- ExpressionSet(normal, normalClinical, featureList, experimentData(TCGA_eset))
saveRDS(normal, "normalExpr.Rds")
```

Now we should be able to use this sampled data from our apps!

## tumorExpr.Rds

      
    Raw
  

              tumorExpr.Rds
            
          
            View raw
        
    
## ui.R
library(shiny)

geneList <- c( "KRAS", "EGFR", "KLF6", "FOXO1", "JAK2", "BRCA1", "BRCA2", "PPM1D")

shinyUI(pageWithSidebar(
  # Application title
  headerPanel("Hello Shiny Bioconductor Survival!"),

  # Sidebar with a selector to choose a gene
  sidebarPanel(
    selectInput("gene", "Gene", geneList),
    plotOutput("densityPlot",
               clickId="densityClick",
               hoverId="densityHover",
               hoverDelay=250,
               hoverDelayType="throttle")
  ),

  # Show a plot of the generated distribution
  mainPanel(
    plotOutput("kmPlot"),
    verbatimTextOutput("info")
  )
))
	library(shiny)
	library(Biobase)
	library(survival)

	# Load in the sampled matrices we've generated ahead of time.
	tumor <- readRDS("tumorExpr.Rds")
	# Distill the expression data away from the ExpressionSet object.
	tumorExp <- exprs(tumor)

	shinyServer(function(input, output, session) {

	# Need a wrapper around the densityClick input so we can manage whether or
	# not the click occured on the current Gene. If it occured on a previous
	# gene, we'll want to mark that click as 'stale' so we don't try to use it
	# later.
	currentClick <- list(click=NULL, stale=FALSE)
	handleClick <- observe({
	if (!is.null(input$densityClick) && !is.null(input$densityClick$x)){
	currentClick$click <<- input$densityClick
	currentClick$stale <<- FALSE
	}
	}, priority=100)

	getCutoff <- reactive({
	# We need this function to subscribe to a couple of dependencies. Without
	# explicitly providing these two at the top, this function may return a
	# cached value without realizing that a new click has occured, or a new
	# gene has been loaded.
	input$densityClick
	geneExp()

	# See if there's been a click since the last gene change.
	if (!is.null(currentClick$click) && !currentClick$stale){
	return(currentClick$click$x)
	}

	return (mean(geneExp()))
	})

	#' Extract the relevant tumor expression values.
	geneExp <- reactive({
	geneExp <- tumorExp[rownames(tumorExp) == input$gene, ]

	currentClick$stale <<- TRUE

	geneExp
	})

	#' Render a plot to show the distribution of the gene's expression
	output$densityPlot <- renderPlot({
	# Plot to density plot
	tumorGene <- geneExp()
	plot(density(tumorGene), main="Distribution", xlab="")

	# Add a vertical line to show where the current cutoff is.
	abline(v=getCutoff(), col=4)

	# Draw a line where they're hovering
	if (!is.null(input$densityHover) && !is.null(input$densityHover$y)){
	abline(v=input$densityHover$x, col=5)
	}
	}, bg="transparent")

	#' A reactive survival formula
	survivalFml <- reactive({
	tumorGene <- geneExp()

	# Create the groups based on which samples are above/below the cutoff
	expressionGrp <- as.integer(tumorGene < getCutoff())

	# Make sure there's more than one group!
	if (length(unique(expressionGrp)) < 2){
	stop("You must specify a cutoff that places at least one sample in each group!")
	}

	# Create the survival object
	surv <- with(pData(tumor),
	Surv(days_to_death, recurrence_status=="recurrence"))
	return(surv ~ expressionGrp)
	})

	#' Print out some information about the fit
	output$info <- renderPrint({
	surv <- survivalFml()
	sDiff <- survdiff(surv)

	# Calculate the p value
	# Extracted from the print.survdiff function in the survival package
	df <- (sum(1 * (sDiff$exp > 0))) - 1
	pv <- format(signif(1 - pchisq(sDiff$chisq, df), 2))

	cat ("p-value: ", pv, "\n")
	})

	#' Create a Kaplan Meier plot
	output$kmPlot <- renderPlot({
	surv <- survivalFml()
	plot(survfit(surv), col=1:2, xlab="Survival Time (Days)",
	ylab="% Survival")
	legend(10,.4,c("Low Expr", "High Expr"), lty=1, col=1:2)
	})
	})
	Extract and Store Some Data
	========================================================

	We'll first want to install the `curatedOvarianData` package from Bioconductor. Note that this package is a few hundred MB.

	```{r, cache=TRUE, results='hide', message=FALSE}
	source("http://bioconductor.org/biocLite.R")
	biocLite("curatedOvarianData")
	```

	Now we'll load the package and load in one of the datasets.

	```{r, results='hide', message=FALSE}
	library(curatedOvarianData)
	data(TCGA_eset)
	```

	Let's write out the clinical data so we have it for later. We'll trim out the "uncurated" metadata column to save some space and make the printing a bit easier.

	```{r}
	clinical <- phenoData(TCGA_eset)
	pData(clinical) <- pData(clinical)[,colnames(pData(clinical))!="uncurated_author_metadata"]
	```

	We won't want to store all of the tumor samples available in the dataset, so let's grab 50 tumor samples at random and keep all of the normal samples.

	```{r}
	# Get all normal phenotypic data
	normalClinical <- clinical
	pData(normalClinical) <- pData(clinical)[pData(clinical)$sample_type == "adjacentnormal",]

	# Sample 20 samples from the tumor phenotypic data
	tumorIndices <- sample(which(pData(clinical)$sample_type == "tumor"), 50)
	tumorClinical <- clinical
	pData(tumorClinical) <- pData(clinical)[tumorIndices,]
	```

	Filtering
	---------

	For the sake of demonstration, we'll want to keep the dataset fairly small. So let's filter the data to the few dozen samples we selected above and only 8 genes.

	```{r}
	# Provide the HGNC symbols of 8 genes
	geneList <- c("EGFR", "KLF6", "FOXO1", "KRAS", "JAK2", "BRCA1", "BRCA2", "PPM1D")

	# Extract the relevant probeset to gene mappings
	featureList <- fData(TCGA_eset)
	featureList <- featureList[featureList$gene %in% geneList, ]
	featureList <- featureList[match(geneList, rownames(featureList)), ] #order
	featureList <- AnnotatedDataFrame(featureList)

	# Filter the tumor expression data to only relevant genes and samples
	tumor <- exprs(TCGA_eset[,tumorIndices])
	tumor <- tumor[match(geneList, rownames(tumor)),]
	tumor <- ExpressionSet(tumor, tumorClinical, featureList, experimentData(TCGA_eset))
	saveRDS(tumor, "tumorExpr.Rds")
	```

	And grab all of the normal tissue samples:

	```{r}
	# Filter the normal expression data to only relevant genes and samples
	normal <- exprs(TCGA_eset[,TCGA_eset@phenoData@data$sample_type == "adjacentnormal"])
	normal <- normal[match(geneList, rownames(normal)),]
	normal <- ExpressionSet(normal, normalClinical, featureList, experimentData(TCGA_eset))
	saveRDS(normal, "normalExpr.Rds")
	```

	Now we should be able to use this sampled data from our apps!
	library(shiny)

	geneList <- c( "KRAS", "EGFR", "KLF6", "FOXO1", "JAK2", "BRCA1", "BRCA2", "PPM1D")

	shinyUI(pageWithSidebar(
	# Application title
	headerPanel("Hello Shiny Bioconductor Survival!"),

	# Sidebar with a selector to choose a gene
	sidebarPanel(
	selectInput("gene", "Gene", geneList),
	plotOutput("densityPlot",
	clickId="densityClick",
	hoverId="densityHover",
	hoverDelay=250,
	hoverDelayType="throttle")
	),

	# Show a plot of the generated distribution
	mainPanel(
	plotOutput("kmPlot"),
	verbatimTextOutput("info")
	)
	))