Skip to content

Instantly share code, notes, and snippets.

@johandahlberg
Created April 12, 2013 08:28
Show Gist options
  • Save johandahlberg/5370482 to your computer and use it in GitHub Desktop.
Save johandahlberg/5370482 to your computer and use it in GitHub Desktop.
Playing with getting source from URL, to get gene names from refseq-ids and converting them into R if-statements. Mostly an experiment in using Scalas parallel collections. NOTE: This should under no circumstances be used in production, then you need to be using some API instead. You have been warned.
object PlayWithURLs extends App {
// Some ids for the BRCA1 gene
val refseqids = List("uc010whl.1", "uc002icp.3", "uc010whp.1")
def queryRefSeqId(refSeqId: String): String = {
def read(url: String): String = io.Source.fromURL(url).mkString
val document = read("http://genome.ucsc.edu/cgi-bin/hgGene?hgg_gene=" + refSeqId + "&org=human")
val regexp = """.*Genetic Association Database:\s\<A HREF=.*\>(\w+)\<.*""".r
regexp.findFirstMatchIn(document).get.group(1)
}
def convertToRCode(refSeqId: String, geneName: String): String = {
"""if(refseqid == """" + refSeqId + """") return("""" + geneName + " " + "(" + refSeqId + ")" + """")"""
}
val startTime = System.nanoTime()
refseqids.par.foreach(id => println(convertToRCode(id, queryRefSeqId(id))))
val endTime = System.nanoTime()
println("time: " + (endTime - startTime) / 1000000000.0)
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment