Skip to content

Instantly share code, notes, and snippets.

@jwaageSnippets
Created March 11, 2013 12:34
Show Gist options
  • Save jwaageSnippets/5133941 to your computer and use it in GitHub Desktop.
Save jwaageSnippets/5133941 to your computer and use it in GitHub Desktop.
Get refseq genes from UCSC, and extract longest transcripts for each gene symbol
library("rtracklayer")
session <- browserSession("UCSC")
genome(session)<-"mm9"
query <- ucscTableQuery(session, "refGene")
tableName(query) <- "refGene"
getTable(query) -> refseq
refseq[,c(2,3,4,5,6,7,8,13)] -> refseq
refseq$"width" <- refseq$"txEnd"-refseq$"txStart"
as.character(refseq[,1]) -> refseq[,1]
as.character(refseq[,8]) -> refseq[,8]
split(refseq, refseq$"name2") -> refseqSplit
getGene <- function(x)
{
x[which.max(x$"width"),]
}
lapply(refseqSplit, FUN=getGene) -> result
do.call("rbind", result) -> result
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment