Skip to content

Instantly share code, notes, and snippets.

@sckott
Last active August 29, 2015 13:56
Show Gist options
  • Save sckott/9267005 to your computer and use it in GitHub Desktop.
Save sckott/9267005 to your computer and use it in GitHub Desktop.

Install taxize

install.packages("taxize")

Load taxize

library(taxize)

Forget about this section on getting names, just for demonstration purposes - just use your own names if you have them handy.

Get some species names in the grass family

df <- theplantlist[theplantlist$family %in% 'Poaceae', ]
rows <- sample(1:nrow(df), size = 100)
spnames <- as.character(apply(df[rows,], 1, function(z) paste(z[c('genus','sp')], collapse = " ")))

I made a vector of plant species names from the above that have gene data in genbank - many did not

species <- c('Imperata brasiliensis','Hylebates cordatus','Apocopis intermedius',
             'Paspalum subciliatum','Bromus nottowayanus','Chimonobambusa marmorea',
             'Panicum adenophorum','Otatea glauca','Himalayacalamus falconeri',
             'Briza lamarckiana','Trisetum turcicum','Brachiaria subulifolia',
             'Boissiera squarrosa','Arthrostylidium pubescens','Neyraudia reynaudiana'
             ,'Bromus gunckelii','Poa sudicola','Pentameris thuarii',
             'Calamagrostis inexpansa','Willkommia texana','Helictotrichon cantabricum',
             'Muhlenbergia tenuifolia','Sporobolus ioclados','Bambusa cerosissima',
             'Axonopus flabelliformis','Glyceria lithuanica','Pentaschistis malouinensis',
             'Perrierbambus madagascariensis','Hierochloe alpina','Hemarthria compressa',
             'Zizania latifolia','Festuca altaica','Gigantochloa wrayi','Festuca alpina',
             'Aegilops caudata','Elymus cognatus','Agrostis gracililaxa','Gymnopogon foliosus')

First, you can use ncbi_search() to search for what genes are available

ncbi_search(spnames[1])

Then, or if you already know what genes yo uwant, you can use ncbi_getbyname() to get the genes you want. First we can see what genes are avail. for species[2] = Hylebates cordatus, then get the data

ncbi_search(species[2])
ncbi_getbyname(species[2], gene = "ndhF")

Or we could get a gene by ID using ncbi_getbyid() by passing in an ID

out <- ncbi_search(species[2])
genes <- ncbi_getbyid(out$gi_no[3])
genes$sequence # there's your sequence

To get data for many at once, collect the ids you want, them pass them to ncbi_getbyid()

out <- ncbi_search(species[1:3])
myids <- do.call(c, sapply(out, function(x) x$gi_no))
tmp <- ncbi_getbyid(myids)
as.character(tmp$sequence)[[1]]
[1] "TCTAGGGGCGTCAAGGAACACTTCTATTGCCTTGCTCGGTGGAGCAGTCAGCCTGCCTTCCGCTCCCCACGCAGTGATGATATCTTAATCCACACGACTCTTGGCAATAGATATCTCAACTCTCACATCAATGGANGTAGCAAAATGCGATACCTGGTGTGAATTGNAAAATCCCGCGAACCATCGAGTTTTTGAACGCAAGTTGCGCTCGAGGCCTTCTGGTCGAGGGCATGTNTGCCTGGGCGTCATGCCAAAAGACACTCTCAACCCACCCTCGGGGAGGACGTGGTGTTTGGACCCCCACGCCGCAGGGCGCGGTATGCTGAAGTTGGGTCTGCCGGTGAACCATGTCGGGCANAGCACGAGGTGGGCGACATCAGTTGTTCTTGGTGCAGCGCCCCGGCGCGCGGCCAGCGTGTCGGCCCTAWGGACCCATCGAGCACCGCAG"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment