Skip to content

Instantly share code, notes, and snippets.

@mtmorgan
Created September 29, 2014 02:02
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mtmorgan/60b52105ffce42179614 to your computer and use it in GitHub Desktop.
Save mtmorgan/60b52105ffce42179614 to your computer and use it in GitHub Desktop.
Retrieve UCSC genomes() and their latin bionomial by scraping UCSC web pages, and translate these to NCBI taxonomyId through entrez eutils calls
loadNamespace("rtracklayer")
loadNamespace("XML")
.organismToTaxid <- function(organism=character())
{
## query NCBI for taxonomy ID
.eutils <- "http://eutils.ncbi.nlm.nih.gov/entrez/eutils"
## 1. ids
uorganism <- unique(organism[!is.na(organism)])
query <- paste(uorganism, collapse=" OR ")
url <- sprintf("%s/esearch.fcgi?db=taxonomy&term=%s&retmax=%d",
.eutils, query, length(uorganism))
xml <- XML::xmlParse(url)
## 2. records
id <- as.character(sapply(xml["//Id/text()"], XML::xmlValue))
scin <- taxid <- character()
if (length(id)) {
query <- paste(id, collapse=",")
url <- sprintf("%s/efetch.fcgi?db=taxonomy&id=%s&retmax=%d",
.eutils, query, length(uorganism))
xml <- XML::xmlParse(url)
scin <- sapply(xml["/TaxaSet/Taxon/ScientificName"], XML::xmlValue)
taxid <- sapply(xml["/TaxaSet/Taxon/TaxId/text()"], XML::xmlValue)
}
## 3. result
as.integer(taxid)[match(organism, scin)]
}
ucscAnnotatedGenomes <- function()
{
.ucsc <- "http://genome.ucsc.edu/cgi-bin"
.tryQuery <- function(url, query)
tryCatch({
XML::htmlParse(url)[[query]]
}, error=function(err) {
warning(conditionMessage(err))
NA
})
## UCSC genomes and scientific names
genomes <- rtracklayer::ucscGenomes()
## scrape UCSC for scientific names
urls <- sprintf("%s/hgGateway?db=%s", .ucsc, genomes$db)
names(urls) <- genomes$db
organism <- sapply(urls, .tryQuery, "string(//div[@id='sectTtl']/i)")
taxid <- .organismToTaxid(organism)
cbind(genomes, organism, taxid)
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment