Skip to content

Instantly share code, notes, and snippets.

@joelnitta
Created April 8, 2020 01:35
Show Gist options
  • Save joelnitta/15e6238f55f47e2a4ccd4efdbd2f969b to your computer and use it in GitHub Desktop.
Save joelnitta/15e6238f55f47e2a4ccd4efdbd2f969b to your computer and use it in GitHub Desktop.
Check DOIs in bib file
library(bib2df)
library(rcrossref)
library(tidyverse)
# Double check to make sure DOIs in bibliography match the right papers
# Load clean bibliography, select just relevant columns
path_to_bib <- "ms/references.bib"
bib <- bib2df(path_to_bib) %>%
janitor::clean_names() %>%
select(title, journal, doi)
# Extract DOIs from bibliography
dois <- bib %>%
filter(!is.na(doi)) %>%
pull(doi)
# Look up data associated with DOIs in crossref
cr_data <-
cr_works(dois) %>%
magrittr::extract2("data") %>%
dplyr::select(cr_title = title, cr_journal = container.title, doi) %>%
unique
# Combine the two datasets for comparison
combined <-
bib %>%
filter(!is.na(doi)) %>%
inner_join(cr_data, by = "doi") %>%
# Convert titles to lower case to avoid mismatches due to case only
mutate_at(vars(matches("title|journal")), str_to_lower)
# Check entries that differ by title
combined %>%
filter(title != cr_title) %>%
View
# Check entries that differ by journal
combined %>%
filter(journal != cr_journal) %>%
View
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment