Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save damianooldoni/3fa9cc1ffa67377a9757df097d48d19f to your computer and use it in GitHub Desktop.
Save damianooldoni/3fa9cc1ffa67377a9757df097d48d19f to your computer and use it in GitHub Desktop.
Snippet to show how to retrieve and add scientific names to a dataframe containing a list of non unique vernacular names based on GBIF Backbone. The most likely match only is returned. If the vernacular name cannot be matched, NA is returned
#' load packages
library(rgbif)
library(tidyverse)
#' example input
vernacular_names_df <- tibble(
id = c(1,2,3,4,5,6,7,8),
vernacular_name = c("Beenvissen",
"Bruine beer",
"Bont zandoogje",
"Bont zandoogje",
"Bruine beer",
"Beenvissen",
"Muscusrat > 400g",
"doodaars")
)
#' define get_vernacular_name() core function
#' input: a vernacular name
#' output: the best matched scientific name from the GBIF Backbone, NA_character_ if no match found
get_vernacular_name <- function(vn) {
names <-
name_lookup(vn,
datasetKey = "d7dddbf4-2cf0-4f39-9b2a-bb099caae36c",
limit = 1)$data # this returns the most likely taxon
if (nrow(names) > 0) {
names$scientificName
} else {
NA_character_
}
}
n_vernacular_names_df <-
vernacular_names_df %>%
# group by vernacular name and compact the data
group_by(vernacular_name) %>%
nest() %>%
# find scientific name for each (distinct) vernacular name
mutate(scientificName = map_chr(vernacular_name,get_vernacular_name)) %>%
# ungroup result
ungroup() %>%
# remove unneeded columns
select(-one_of("data")) %>%
# add other columns from input df vernacular_names_df
right_join(vernacular_names_df, by = "vernacular_name") %>%
# set new column scientificName at the right side
select(all_of(names(vernacular_names_df)), scientificName) %>%
# reorder rows based on original order in input df
right_join(vernacular_names_df,
by = names(vernacular_names_df))
#' show results
n_vernacular_names_df
@damianooldoni
Copy link
Author

Another interesting remark from @fredericpiesschaert:

één van de problemen is dat er op alle vernacular names onafhankelijk van de taal gecheckt wordt, waardoor je bv bij hop een hoop taxa krijgt die niets te maken hebben met wat we willen krijgen. Als je de taal als parameter kan meegeven zou dat dus ook al veel schelen denk ik

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment