public
Last active

When can we expect the last damn microarray?

  • Download Gist
microarray.R
R
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82
library("plyr")
library("XML")
library("ggplot2")
 
#Concatenate SQL-style
concat<-function(...,sep="",collapse=NULL){
strings<-list(...)
#NULL, NA
if(
all(unlist(llply(strings,length))>0)
&&
all(!is.na(unlist(strings)))
){
do.call("paste", c(strings, list(sep = sep, collapse = collapse)))
}else{
NULL
}
}
getCount<-function(term){function(year){
nihUrl<-concat("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=",term,"+",year,"[pdat]")
#cleanurl<-gsub('\\]','%5D',gsub('\\[','%5B',x=url))
#http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=microarray%5btitle%5d+2003%5bpdat%5d
xml<-xmlTreeParse(URLencode(nihUrl),isURL=TRUE)
#Data Mashups in R, pg17
as.numeric(xmlValue(xml$doc$children$eSearchResult$children$Count$children$text))
}}
 
years<-1995:2011
df<-data.frame(type="obs",year=years,
mic=sapply(years,function(x){do.call(getCount('microarray[title]'),list(x))}),
ngs=sapply(years,function(x){do.call(getCount('"next generation sequencing"[title] OR "high-throughput sequencing"[title]'),list(x))})
)
 
#97 is a fair start
df<-subset(df,year>=1997)
mdf<-melt(df,id.vars=c("type","year"),variable_name="citation")
 
c<-ggplot(mdf,aes(x=year))
p<-c+geom_point(aes(y=value,color=citation),size=3) +
ylab("papers") +
stat_smooth(aes(y=value,color=citation),data=subset(mdf,citation=="mic"),method="loess") +
scale_x_continuous(breaks=seq(from=1997,to=2011,by=2))
print(p)
 
#Return 0 for negative elements
# noNeg(c(3,2,1,0,-1,-2,2))
# [1] 3 2 1 0 0 0 2
noNeg<-function(v){sapply(v,function(x){max(x,0)})}
 
#Return up to the first negative/zero element inclusive
# toZeroNoNeg(c(3,2,1,0,-1,-2,2))
# [1] 3 2 1 0
toZeroNoNeg<-function(v){noNeg(v)[1:firstZero(noNeg(v))]}
 
#return index of first zero
firstZero<-function(v){which(noNeg(v)==0)[1]}
 
#let's peer into the future
df.lo.mic<-loess(mic ~ year,df,control=loess.control(surface="direct"))
 
#when will it stop?
mic_predict<-as.integer(predict(df.lo.mic,data.frame(year=2012:2020),se=FALSE))
zero_year<-2011+firstZero(mic_predict)
cat(concat("LOESS projects ",sum(toZeroNoNeg(mic_predict))," more damn microarray papers."))
cat(concat("The last damn microarray paper is projected to be in ",(zero_year-1),"."))
 
#predict ngs growth
df.lo.ngs<-loess(ngs ~ year,df,control=loess.control(surface="direct"))
ngs_predict<-as.integer(predict(df.lo.ngs,data.frame(year=2012:zero_year),se=FALSE))
 
pred_df<-data.frame(type="pred",year=c(2012:zero_year),mic=toZeroNoNeg(mic_predict),ngs=ngs_predict)
df2<-rbind(df,pred_df)
 
mdf2<-melt(df2,id.vars=c("type","year"),variable_name="citation")
 
c2<-ggplot(mdf2,aes(x=year))
p2<-c2+geom_point(aes(y=value,color=citation,shape=type),size=3) +
ylab("papers") +
scale_y_continuous(breaks=seq(from=0,to=1600,by=200))+
scale_x_continuous(breaks=seq(from=1997,to=zero_year,by=2))
print(p2)

Thank you for this. The comment on line 25 made me buy (and nearly go through) Data Mashups with R. I got the Philly foreclosures heat map working with the new Yahoo PlaceFinder API, which replaced the Geocode API and is in turn about to be replaced by something called BOSS. But the general recipe still worked. I couldn't have done it from scratch. Unfortunately, Chapter 2, where you draw some correlations between foreclosures and Census data from Factfinder, is too much work for the payoff. The new Factfinder2 doesn't offer a ready way to reproduce old Factfinder-style downloads.

Anyway, microarray.R too takes some tweaking (I'm running R 2.15.2 on a Mac). First, you need to add library("reshape2") at the top, or else R will object to the melt() function. Second, melt() no longer preserves the name of the variable_name being melted. It simply renames it as "variable". I had to call names(mdf) and names(mdf2) respectively right after each call to melt(), as in

names(mdf) <- c("type","year","citation","value")
...
names(mdf2) <- c("type","year","citation","value")

With these two small changes, microarray.R works great.

thanks for your tips.

I have created this repository that includes the Data Mashups in R code using the wretched Yahoo! BOSS API. This includes the supplemental census and shapefiles that have gone missing:
https://github.com/leipzig/datamashupsinr

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.