Skip to content

Instantly share code, notes, and snippets.

Created September 19, 2018 18:27
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Star You must be signed in to star a gist
What would you like to do?
A small script that reads DOIs from a bibtex file, fetches abstracts from Crossref when they are available, and exports another bibtex file with that added info.
# install and run packages
# install.packages("bib2df")
# install.packages("rcrossref")
# import the bibtex to a data frame
# biblio.bib is a file in the working directory
df <- bib2df("biblio.bib")
# fetch the bibtex file from url:
# url <- ""
# df <- bib2df(url)
# loop through many DOIs, allowing for failures
x <- lapply(df$DOI, function(z) tryCatch(cr_abstract(z), error = function(e) e))
# write the results to a new field called ABSTRACT as character string
df$ABSTRACT <- unlist(x)
# clean up the abstract field
# add any other regular expressions as you see fit
df$ABSTRACT[grepl("HTTP 404", df$ABSTRACT,] <- NA
df$ABSTRACT[grepl("no abstract found for", df$ABSTRACT,] <- NA
df$ABSTRACT <- gsub("<p>", "", df$ABSTRACT)
df$ABSTRACT <- gsub("</p>", "", df$ABSTRACT)
df$ABSTRACT <- gsub("<strong>", "", df$ABSTRACT)
df$ABSTRACT <- gsub("</strong>", "", df$ABSTRACT)
df$ABSTRACT <- gsub("<li>", "", df$ABSTRACT)
df$ABSTRACT <- gsub("</li>", "", df$ABSTRACT)
df$ABSTRACT <- gsub("<ul>", "", df$ABSTRACT)
df$ABSTRACT <- gsub("</ul>", "", df$ABSTRACT)
df$ABSTRACT <- gsub("<em>", "", df$ABSTRACT)
df$ABSTRACT <- gsub("</em>", "", df$ABSTRACT)
# write to bibtex file
# following bibtex formatting rules, if there is no text following the abstract field the field will not be written at all
df2bib(df, file = "biblio.bib", append = FALSE)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment