Skip to content

Instantly share code, notes, and snippets.

@johandahlberg
Created June 27, 2013 16:35
Show Gist options
  • Save johandahlberg/5878012 to your computer and use it in GitHub Desktop.
Save johandahlberg/5878012 to your computer and use it in GitHub Desktop.
A R script to plot cumulative coverage based on GATK DepthOfCoverage output files. Not very pretty, but perhaps helpful to somebody.
library(ggplot2)
setwd("<my working directory>")
removeExtraLine <- function(x) {
x[,2:length(x)]
}
createInitialDataset <- function(x, sampleName) {
extraLineRemoved <- t(removeExtraLine(x))
data<-data.frame(extraLineRemoved, rep(sampleName))
colnames(data)<-c("coverage","proportion_coverge","sample")
data
}
combineDataSets <- function(data, x, sampleName) {
extraLineRemoved <- t(removeExtraLine(x))
x<-data.frame(extraLineRemoved, rep(sampleName))
colnames(x)<-c("coverage","proportion_coverge","sample")
data<-rbind(data, x)
data
}
# Load coverage data
sample_1 = read.delim("<sample_1>", header=F)
sample_2 = read.delim("<sample_2>", header=F)
sample_list<-list(sample_1, sample_2)
# Combining the data sets
data<-createInitialDataset(sample_list[[1]], sample_list[[1]][[2,1]])
for(i in 2:length(sample_list)) {
#print(sample_list[[i]][[2,1]])
data <- combineDataSets(data, sample_list[[i]], sample_list[[i]][[2,1]])
}
# Remove the "gte_" part from the coverage function
data$coverage<-sapply(data$coverage, function(x) sub("gte_", "",x))
# Convert to reasonable data types...
data$coverage<-as.numeric(data$coverage)
data$proportion_coverge<-as.numeric(as.character(data$proportion_coverge))
ggplot(data=data, aes(x=coverage/max(coverage), y=proportion_coverge, color=sample)) +
geom_line() +
labs(title = "Cumulative coverage compassion") +
scale_y_continuous(breaks=seq(0,1,0.1))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment