Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@ozagordi
Last active April 12, 2016 14:07
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ozagordi/c1e1c4158ab4e94e4683 to your computer and use it in GitHub Desktop.
Save ozagordi/c1e1c4158ab4e94e4683 to your computer and use it in GitHub Desktop.
Viral genomes deposited in NCBI per month

Run the NCBI database interrogation with edirect command line tools and then create the figure with the included R script.

esearch -db nuccore -query "txid10239 [orgn] AND \"complete genome\" [Title] NOT txid131567 [orgn]" | \
efetch -format docsum | \
xtract -pattern DocumentSummary -element CreateDate > viral_dates.txt
library(ggplot2)
library(dplyr)
library(scales)
library(lubridate)
library(ggthemes)
# install NCBI edirect to run this:
# esearch -db nuccore -query "txid10239 [orgn] AND \"complete genome\" [Title] NOT txid131567 [orgn]" | \
# efetch -format docsum | \
# xtract -pattern DocumentSummary -element CreateDate > viral_dates.txt
viral_dates = read.table("~/tmp/viral_dates.txt", quote="\"")
vd = data.frame(dates = as.Date(viral_dates$V1))
vd$month = as.Date(cut(vd$dates, breaks = "month"))
dtp = group_by(vd, month) %>%
summarise(total = n())
p = ggplot(data=dtp, aes(x=month, y=total))
p = p + geom_point()
p = p + theme_solarized() + scale_colour_solarized("blue")
p = p + xlab('') + ylab('sequences created per month')
p = p + ggtitle('NCBI complete viral genomes')
#p = p + theme(axis.title.y = element_text(size=16))
#p = p + theme(axis.text.x = element_text(size=14, angle=0))
#p = p + theme(axis.text.y = element_text(size=14))
p
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment