Skip to content

Instantly share code, notes, and snippets.

@douglasrizzo
Last active June 27, 2022 00:19
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save douglasrizzo/b1d324d0698120ebf8b1c0c91d8c251c to your computer and use it in GitHub Desktop.
Save douglasrizzo/b1d324d0698120ebf8b1c0c91d8c251c to your computer and use it in GitHub Desktop.
Merging inconsistent author names in a bib file
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@douglasrizzo
Copy link
Author

@reox the ORCID is indeed the solution to this problem, as it would work as an ID or primary key for all authors. Unfortunately, it is very rare to find ORCID information in bib files. We would need to:

  • have the PDF files of all the papers whose authors' names we want to normalize;
  • assume the PDFs contain their ORCIDs, and;
  • find a way to scrape it from PDF files with different formatting.

Maybe if there was a service that gave us author information if we gave them the DOI of the paper, or its title, we would have an easier time. I know JabRef has a way to acquire paper information, sometimes including full author names, through a paper's DOI.

@mhoban
Copy link

mhoban commented Jun 27, 2022

@douglasrizzo thanks for this! I adapted your code to use with pyzotero so it wouldn't be necessary to go through a bibtex intermediary. https://gist.github.com/mhoban/3564f789a934028f9898b0a316588dd1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment