Skip to content

Instantly share code, notes, and snippets.

@danielecook
Last active August 29, 2015 14:04
Show Gist options
  • Select an option

  • Save danielecook/5ea8d34679fb197941c0 to your computer and use it in GitHub Desktop.

Select an option

Save danielecook/5ea8d34679fb197941c0 to your computer and use it in GitHub Desktop.
Example of pubmed pairwise searching
library(RISmed)
library(parallel)
library(ggplot2)
# Given two lists of terms, lets see how 'hot' they are together
set1 <- c("ebola","autoimmune","Diabetes","HIV","Glioblastoma","Asthma","Schizophrenia")
set2 <- c("C. elegans","D. Melanogaster","C. japonica", "M. Musculus","S. Cerevisiae")
# Generate all possible pairs
pairs <- expand.grid(set1, set2, stringsAsFactors=F)
# Search pubmed for each pair, and return the number of search results.
results <- mclapply(seq(nrow(pairs)), function(x) {
res <- EUtilsSummary(sprintf("%s %s", pairs[x,]$Var1, pairs[x,]$Var2, type='esearch', db='pubmed'))
c(q1=pairs[x,]$Var1, q2=pairs[x,]$Var2, count=QueryCount(res))
})
# Do some data formatting on the results.
results <- as.data.frame(do.call("rbind", results), stringsAsFactors=F)
# Turn the number of search results into numeric form.
results$count <- as.numeric(results$count)
# Plot the results using geom_tile
ggplot(results) +
geom_tile(aes(x=q1, y=q2, fill=count)) +
geom_text(aes(x=q1, y=q2, label=count), color = "white") +
labs(title="Disease Publications by Organism", x="x", y="y")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment