Skip to content

Instantly share code, notes, and snippets.

@dinovski
Last active September 7, 2018 19:59
Show Gist options
  • Save dinovski/9e969d005305e00f9f873cb0b4b7a8dc to your computer and use it in GitHub Desktop.
Save dinovski/9e969d005305e00f9f873cb0b4b7a8dc to your computer and use it in GitHub Desktop.
get unique min/max gene coordinates from gene bed
inDir="/bioinfo/users/dzielins/dbase/mouse/mm10/"
genome="mm10"
geneBed="gencode.vM13.annotation_gene.bed"
inBed <- read.table(paste0(inDir, geneBed), header = FALSE, stringsAsFactors = FALSE, sep = "\t")
colnames(inBed) <- c("chr", "start", "end", "gene", "score", "strand")
## Keep min start and max end for each gene
start <- aggregate(start ~ chr + gene + strand, inBed, min)
end <- aggregate(end ~ chr + gene + strand, inBed, max)
newBed <- merge(start, end, by = c("chr","gene","strand"))
newBed$score <- 0
newBed <- newBed[, c("chr","start","end","gene","score","strand")]
rmGenes <- unique(newBed$gene[duplicated(newBed$gene)])
cleanBed <- newBed[which(!newBed$gene %in% rmGenes),]
cleanSAF<-cleanBed[, c("gene", "chr","start","end","strand")]
write.table(cleanBed, file=paste0(inDir, genome, "_unique_genes.bed"),
quote = FALSE, row.names = FALSE, col.names = FALSE, sep = "\t")
write.table(cleanSAF, file=paste0(inDir, genome, "_unique_genes.saf"),
quote=FALSE, row.names=FALSE, col.names=FALSE, sep="\t")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment