Created
May 11, 2017 10:12
-
-
Save fdschneider/69e61b14c12ccdda780fbc1c5f0a4f1c to your computer and use it in GitHub Desktop.
Function to extract accepted names and taxonomy from GBIF Backbone Taxonomy
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#' Get accepted canonical names and taxonomy for a given species name | |
#' | |
#' @param x a character string or vector of species names. | |
#' @param infraspecies logical. If TRUE, the given name is resolved to infraspecies taxon, i.e. subspecies names will not be pooled (not working!) | |
#' @param fuzzy logical. If TRUE, function tries fuzzy matching for species requests. May produce output, if normal matching fails. (not working!) | |
#' @param verbose logical. If FALSE, warnings and messages are suppressed. | |
#' | |
#' @return a data.frame of accepted taxon names and higher taxa. | |
#' @import taxize | |
#' @import data.table | |
#' @export | |
#' | |
#' @examples | |
#' | |
#' get_gbif_taxonomy(c("Chorthippus albomarginatus", "Chorthippus apricarius", "Chorthippus biguttulus", "Chorthippus dorsatus", "Chorthippus montanus", "Chorthippus parallelus", "Chrysochraon dispar", "Conocephalus dorsalis", "Conocephalus fuscus", "Decticus verrucivorus", "Euthystira brachyptera", "Gomphocerippus rufus", "Gryllus campestris", "Metrioptera roeselii", "Omocestus viridulus", "Phaneroptera falcata", "Platycleis albopunctata", "Spec", "Stenobothrus lineatus", "Stenobothrus stigmaticus", "Stethophyma grossum", "Tetrix kraussi", "Tetrix subulata", "Tetrix tenuicornis", "Tetrix undulata", "Tettigonia cantans", "Tettigonia viridissima")) | |
#' | |
get_gbif_taxonomy <- function(x, infraspecies = FALSE, fuzzy = FALSE, verbose = TRUE) { | |
if(length(x) > 1) { # recursive wrapping for vectorized input | |
out <- lapply(x, get_gbif_taxonomy) | |
out <- data.table::rbindlist(out, fill = TRUE) # combine into data.frame | |
} else { | |
# spellchecking: resolve names using data source 11 (GBIF Backbone Taxonomy) | |
resolved <- taxize::gnr_resolve(x, | |
preferred_data_sources = c(11), | |
best_match_only = TRUE, | |
canonical = TRUE) | |
# return NA for unsuccessful matches | |
if(is.null(resolved$matched_name2) ) { | |
out <- data.frame(user_supplied_name = x) | |
attributes(out)$warning <- paste("No matching species name found!") | |
} else { | |
# get gbif ID and detailled information, e.g. synonym status | |
temp <- taxize::get_gbifid_(resolved$matched_name2)[[1]] | |
# switch for allowing for fuzzy matching | |
if(!fuzzy) temp <- subset(temp, matchtype == "EXACT") | |
# eliminate infraspecies (!! Needs improvement: this will make the function remove supraspecies taxa, too!) | |
if(!infraspecies) temp <- subset(temp, rank == "species") #ask for rank according to name resolution provided | |
# if given name is a synonym, do a new request for the accepted species name | |
if(all(temp$status == "SYNONYM")) { | |
if(length(unique(temp$species)) == 1) { | |
out <- get_gbif_taxonomy(unique(temp$species)) | |
out$synonym = TRUE | |
out$user_supplied_name = x | |
} | |
if(length(unique(temp$species)) > 1) out <- get_gbif_taxonomy(temp$species[which.max(temp$rank)]) | |
out$synonym = TRUE | |
out$user_supplied_name = x | |
} else { # if given name is an accepted name, return result into 'out' | |
if(any(temp$status == "ACCEPTED")) { | |
temp <- subset(temp, status == "ACCEPTED") | |
out <- temp | |
# add choice for fuzzy matching which returns warning | |
} | |
out <- cbind(user_supplied_name = x, synonym = FALSE, scientificName = out$species,fullname = out$scientificname, out[,c("rank", "confidence", "kingdom", "phylum", "class","order", "family", "genus")], taxonomy = "GBIF Backbone Taxonomy", taxonID = out$usagekey) | |
if(out$synonym & verbose) warning(paste("Synonym provided! Automatically set ScientificName to accepted species Name!")) | |
} | |
} | |
} | |
class(out) <- c("data.frame", "taxonomy") | |
return(out) | |
} | |
Thanks for the suggestion. I will check it out. Could you provide the species name inputs that produce that error? I'll add it as a test case.
Oh, sorry! I just figured I need to point out that this gist is an older version of the function shipped with the 'traitdataform' package (including full documentation)! That newer version should already deal with DOUBTFUL cases. You might want to try that instead. Just install from CRAN via install.packages('traitdataform')
. The parameter options have changed a little, but main input is still a vector of species names.
Package website is : https://ecologicaltraitdata.github.io/traitdataform/
Thanks, I’ll check it out!
The doubtful synonym I tried was “Antechinus unicolour” if I remember
right.
…On Thu, 13 Jun 2019 at 9:19 pm, Florian Schneider ***@***.***> wrote:
Oh, sorry! I just figured I need to point out that this gist is an older
version of the function shipped with the 'traitdataform' package (including
full documentation)! That newer version should already deal with DOUBTFUL
cases. You might want to try that instead. Just install from CRAN via
install.packages('traitdataform'). The parameter options have changed a
little, but main input is still a vector of species names.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<https://gist.github.com/69e61b14c12ccdda780fbc1c5f0a4f1c?email_source=notifications&email_token=AHD6WCE5RBKPSX4T7R7NKO3P2IULBA5CNFSM4HXFUCBKYY3PNVWWK3TUL52HS4DFVNDWS43UINXW23LFNZ2KUY3PNVWWK3TUL5UWJTQAFTTIG#gistcomment-2942595>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHD6WCBOQD62DA7FV737RMDP2IULBANCNFSM4HXFUCBA>
.
Hi Florian,
I've come across what may be another 'bug' in the function. To be fair, I'm
using it on a pretty tricky list-- most of them are synonyms and many of
them are old.
The issue seems to be that when get_gbifid_() can't find a particular
synonym and reverts it to a HIGHERRANK match, then the resolve_synonyms
step, which takes place before the step that takes care of higher rank
matches, doesn't have a species column to rerun taxize on... it only has a
genus column. Looks like get_gbifid_() automatically throws away columns
that are pointless in its output. This causes the resolve synonyms step to
return an empty list, which subsequently causes the function to crash.
The current tricky name I'm getting an error on is 'Dromicia frontalis.'
It is supposed to be a synonym of Acrobates pygmaeus, but I would be happy
if the function simply threw out this synonym, because it is not actually
in the gbif taxonomy, and is just a case of someone mis-identifying an
existing species as a new one in the 1800s.
I will try to debug it on my end, but would appreciate any pointers!
Cheers,
Anikó
…On Mon, Jun 17, 2019 at 7:35 AM Aniko Toth ***@***.***> wrote:
Thanks, I’ll check it out!
The doubtful synonym I tried was “Antechinus unicolour” if I remember
right.
On Thu, 13 Jun 2019 at 9:19 pm, Florian Schneider <
***@***.***> wrote:
> Oh, sorry! I just figured I need to point out that this gist is an older
> version of the function shipped with the 'traitdataform' package (including
> full documentation)! That newer version should already deal with DOUBTFUL
> cases. You might want to try that instead. Just install from CRAN via
> install.packages('traitdataform'). The parameter options have changed a
> little, but main input is still a vector of species names.
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <https://gist.github.com/69e61b14c12ccdda780fbc1c5f0a4f1c?email_source=notifications&email_token=AHD6WCE5RBKPSX4T7R7NKO3P2IULBA5CNFSM4HXFUCBKYY3PNVWWK3TUL52HS4DFVNDWS43UINXW23LFNZ2KUY3PNVWWK3TUL5UWJTQAFTTIG#gistcomment-2942595>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AHD6WCBOQD62DA7FV737RMDP2IULBANCNFSM4HXFUCBA>
> .
>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Very glad you made this wrapper function, looks like it will be very useful for my project.
Just a suggestion- could you include code to deal with the eventuality that
temp$status
is not "ACCEPTED" or "SYNONYM" (for example, the get_gbifid_ function can also return "DOUBTFUL"). I just added an initial subset of temp after line 39 that excludes anything that is not accepted or a synonym. Otherwise the code throws an error saying that the objectout
does not exist.