Install taxize
install.packages("taxize")
Load taxize
library(taxize)
Forget about this section on getting names, just for demonstration purposes - just use your own names if you have them handy.
Get some species names in the grass family
df <- theplantlist[theplantlist$family %in% 'Poaceae', ]
rows <- sample(1:nrow(df), size = 100)
spnames <- as.character(apply(df[rows,], 1, function(z) paste(z[c('genus','sp')], collapse = " ")))
I made a vector of plant species names from the above that have gene data in genbank - many did not
species <- c('Imperata brasiliensis','Hylebates cordatus','Apocopis intermedius',
'Paspalum subciliatum','Bromus nottowayanus','Chimonobambusa marmorea',
'Panicum adenophorum','Otatea glauca','Himalayacalamus falconeri',
'Briza lamarckiana','Trisetum turcicum','Brachiaria subulifolia',
'Boissiera squarrosa','Arthrostylidium pubescens','Neyraudia reynaudiana'
,'Bromus gunckelii','Poa sudicola','Pentameris thuarii',
'Calamagrostis inexpansa','Willkommia texana','Helictotrichon cantabricum',
'Muhlenbergia tenuifolia','Sporobolus ioclados','Bambusa cerosissima',
'Axonopus flabelliformis','Glyceria lithuanica','Pentaschistis malouinensis',
'Perrierbambus madagascariensis','Hierochloe alpina','Hemarthria compressa',
'Zizania latifolia','Festuca altaica','Gigantochloa wrayi','Festuca alpina',
'Aegilops caudata','Elymus cognatus','Agrostis gracililaxa','Gymnopogon foliosus')
First, you can use ncbi_search()
to search for what genes are available
ncbi_search(spnames[1])
Then, or if you already know what genes yo uwant, you can use ncbi_getbyname()
to get the genes you want. First we can see what genes are avail. for species[2] = Hylebates cordatus, then get the data
ncbi_search(species[2])
ncbi_getbyname(species[2], gene = "ndhF")
Or we could get a gene by ID using ncbi_getbyid()
by passing in an ID
out <- ncbi_search(species[2])
genes <- ncbi_getbyid(out$gi_no[3])
genes$sequence # there's your sequence
To get data for many at once, collect the ids you want, them pass them to ncbi_getbyid()
out <- ncbi_search(species[1:3])
myids <- do.call(c, sapply(out, function(x) x$gi_no))
tmp <- ncbi_getbyid(myids)
as.character(tmp$sequence)[[1]]
[1] "TCTAGGGGCGTCAAGGAACACTTCTATTGCCTTGCTCGGTGGAGCAGTCAGCCTGCCTTCCGCTCCCCACGCAGTGATGATATCTTAATCCACACGACTCTTGGCAATAGATATCTCAACTCTCACATCAATGGANGTAGCAAAATGCGATACCTGGTGTGAATTGNAAAATCCCGCGAACCATCGAGTTTTTGAACGCAAGTTGCGCTCGAGGCCTTCTGGTCGAGGGCATGTNTGCCTGGGCGTCATGCCAAAAGACACTCTCAACCCACCCTCGGGGAGGACGTGGTGTTTGGACCCCCACGCCGCAGGGCGCGGTATGCTGAAGTTGGGTCTGCCGGTGAACCATGTCGGGCANAGCACGAGGTGGGCGACATCAGTTGTTCTTGGTGCAGCGCCCCGGCGCGCGGCCAGCGTGTCGGCCCTAWGGACCCATCGAGCACCGCAG"