Skip to content

Instantly share code, notes, and snippets.

@daeh
Last active April 12, 2024 19:44
Show Gist options
  • Star 18 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save daeh/abc6d46d897b58a657699fa1a408573e to your computer and use it in GitHub Desktop.
Save daeh/abc6d46d897b58a657699fa1a408573e to your computer and use it in GitHub Desktop.
Import Papers 3 library into Zotero
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""Script to facilitate the import of a Readcube Papers 3 library into Zotero
__Purpose of this script__
If you export your Readcube (Mekentosj) Papers3 library as a BibTeX file, the file paths to the PDFs are not formatted
correctly for Zotero to import them.
The specific issues include that:
* Papers3 does not export the file paths in a way that Zotero can understand.
* Papers3 does not export the paths to supplementary files, so only the primary PDF is imported into Zotero.
* Papers3 will export the primary PDF multiple times so you'll end up with multiple copies of the same PDF in Zotero.
* Papers3 includes superfluous supplementary files that you typically don't want to import into Zotero (e.g. *.html and
*.webarchive files).
This script will take the BibTeX file you exported from Papers3 and modify the file paths so that they can be imported into
Zotero.
__Usage__
This script takes as input a BibTeX library exported from readcube/mekentosj Papers3 and outputs a BibTeX library for Zotero
to import.
The script preserves your Papers citekeys, adds supplementary files from the Papers3 Library, removes duplicate links to
PDFs, and removes extraneous *.html and *.webarchive files that are often created by importing articles into Paper from
a web browser.
__Instructions__
* Make sure to have Better BibTeX pre-installed to Zotero if you want to preserve the Papers citekeys.
* Export your Papers3 library as a *.bib file.
Export > BibTeX Library
Make sure to set the "BibTex Record" option to "Complete". This will cause papers to include the paths to the main PDF
(or whatever) file in the *.bib export
* Run this script with python 3.7 or higher to generate the file, 'zotero_import.bib', in the same location as the BibTeX
library export.
* You can pass the script the paths to the Papers3 library and the BibTeX library export as command line arguments,
e.g.:
python Papers3_to_Zotero.py --papers "~/Documents/Library.papers3" --bibtex "~/Desktop/Library.bib"
* Or you can modify the script by updating the 'papers_lib_hardcoded' and 'bibtex_lib_hardcoded' variables with the
paths to your Papers3 library and the BibTeX library that you just exported. E.g.:
papers_lib_hardcoded = "~/Documents/User Library/Library.papers3" ### Path to Papers3 Library
bibtex_lib_hardcoded = "~/Desktop/full_library_export.bib" ### Path to Papers BibTeX library export
* Running the script will generate a new BibTeX file, 'zotero_import.bib', in the same location as the BibTeX library
export.
* Import the 'zotero_import.bib' file that gets generated with Zotero.
* Be sure to check the 'Import errors found:' file if Zotero generates one (if it exists, it will be in whatever folder you
imported the library to; sort by title to find it).
* Also check that special characters in titles and journal names were imported correctly. Sometimes '{\&}' in the
zotero_import.bib will be imported as '<span class="nocase">&</span>'. I'm not sure why or when this happens. You can
search for "</span>" to check.
__NOTE__
The Collections groupings are not preserved with this method. This is one way to manually get your Papers3 Collections into
Zotero after following the above instructions:
* Export each collection as a BibTex library ("Export" set to "Selected Collection" and "BibTex Record" set to "Standard").
This will prevent any file paths from being included in the *.bib file.
* Import that *.bib file directly to Zotero with the option to "Place imported collections and items into new collection"
selected.
* Then merge the duplicate records. That will give you a new collection with links to right papers from your Zotero library.
* In this strategy, you have to do that for each one of your Papers3 Collections. Not ideal but maybe tolerable.
__Author__
Dae Houlihan
__Source__
https://gist.github.com/daeh/abc6d46d897b58a657699fa1a408573e
"""
import argparse
import re
import sys
from pathlib import Path
from warnings import warn
def main(papers=None, bibtex=None):
################################################
### Update these paths or pass via command line:
################################################
### Path to Papers3 Library ###
papers_lib_hardcoded = "~/Documents/Library.papers3"
### Path to the BibTeX export of the Papers3 Library ###
bibtex_lib_hardcoded = "~/Desktop/library.bib"
################################################
papers_lib = papers_lib_hardcoded if papers is None else papers
bibtex_lib = bibtex_lib_hardcoded if bibtex is None else bibtex
papers_library = Path(papers_lib).expanduser()
bibtex_library = Path(bibtex_lib).expanduser()
papers_library_string = str(papers_library).replace(r"(", r"\(").replace(r")", r"\)") + r"/"
if papers_library_string[-9:] != ".papers3/":
raise Exception(
f"The variable 'papers_library' should end in with '.papers3' but is rather: \n\t{str(papers_library)}"
)
if not papers_library.is_dir():
raise Exception(
f"The path you provided to the Papers3 library does not seem to exist or is not a directory: \n\t{str(papers_library)}"
)
if not (bibtex_library.is_file() and bibtex_library.suffix == ".bib"):
raise Exception(
f"The path you provided to the BibTeX Library file you exported from Papers3 does not seem to exist or is not '.bib' file: \n\t{str(bibtex_library)}"
)
out, missing = list(), list()
with open(bibtex_library, "r") as btlib:
for line in btlib:
if line.startswith("file = {"):
templine = re.sub(r"^file = {{(.*?)}},?", r"file = {\1},", line, flags=re.M)
newline = re.sub(r"^file = {(.*?);(\1)},?", r"file = {\1},", templine, flags=re.M)
assert ";" not in newline # assert that this line references only one file
search_str = r"^file = {.*?:" + papers_library_string + r"(.*?\..*?):(.*?/.*?)},?"
filepath_relative = re.search(search_str, newline)
assert isinstance(
filepath_relative, re.Match
), f"Unable to match regex expression:: \n{search_str} \nwith entry from BibTex:: \n{newline}"
primary_file_path = papers_library / filepath_relative.group(1)
if not primary_file_path.is_file():
warn(f"The linked file was not found: {primary_file_path}", UserWarning)
missing.append(primary_file_path)
supp_files = list()
for dir_extra in ["Supplemental", "Media"]:
supp_dir = primary_file_path.parents[0] / dir_extra
if supp_dir.exists():
for x in supp_dir.iterdir():
if (
x.is_file()
and x.suffix not in [".html", ".webarchive"]
and str(x) != str(primary_file_path)
):
supp_files.append(x)
if len(supp_files) > 0:
search_str_supp = (
r"(^file = {.*?:" + papers_library_string + r".*?\..*?:application/.*?)},?"
)
primary_line = re.search(search_str_supp, newline)
assert isinstance(
primary_line, re.Match
), f"Unable to match regex expression:: \n{search_str_supp} \nwith entry from BibTex:: \n{newline}"
newline = primary_line.group(1)
for x in supp_files:
print(f"adding supplementary file for {x.name}")
newline += f';{x.with_suffix("").name + " Supp" + x.suffix}:{x}:application/{x.suffix}'
newline += "},\n"
out.append(newline)
else:
out.append(line)
### New BibTeX record to import into Zotero
modified_lib = bibtex_library.parents[0] / "zotero_import.bib"
with open(modified_lib, "w", encoding="utf-8") as outfile:
for item in out:
outfile.write(item)
if missing:
print("\n\nList of missing files::\n")
for mf in missing:
print(mf)
print(
f"\n\nScript completed but {len(missing)} files referenced in the BibTeX library were not located. They are listed above."
)
else:
print(
f"\n\nScript appears to have completed successfully. You can now import this file into Zotero (make sure Better BibTeX is already installed): \n\t{str(modified_lib)}"
)
return 0
def _cli():
parser = argparse.ArgumentParser(
description=__doc__, formatter_class=argparse.ArgumentDefaultsHelpFormatter, argument_default=argparse.SUPPRESS
)
parser.add_argument("-p", "--papers", help="Path to Papers3 Library")
parser.add_argument("-b", "--bibtex", help="Path to the BibTeX export")
args = parser.parse_args()
return vars(args)
if __name__ == "__main__":
sys.exit(main(**_cli()))
@noclew
Copy link

noclew commented Oct 3, 2019

Hi,
I appreciate that you share the script that I was looking for!
However, when I tried what you recommended, I got an error at the line 66.

Traceback (most recent call last):
File "Papers3_to_Zotero.py", line 66, in
primary_file_path = Path(papers_library_string) / Path(result.group(1)).with_suffix('.'+result.group(2))
AttributeError: 'NoneType' object has no attribute 'group'

Here is the file that I used for the test:
https://www.dropbox.com/s/hz58zeipunbi5mn/papersToZotero.bib?dl=0

Could you please give me some insights why I am having an error?

Thanks!

@daeh
Copy link
Author

daeh commented Oct 10, 2019

Hi @noclew
The issue came from having parentheses in the path to your Papers library. I added a line so regex will escape the () in your path. Let me know if that doesn't resolve it.

@ssp3nc3r
Copy link

Thanks!

@daeh
Copy link
Author

daeh commented Oct 27, 2019

Hi @rderoock — You're using the old script. Grab the latest version and you should be set. (Make sure the result = re.search() line reads re.search(search_str, newline) not re.search(r"^file = {.*?:" + papers_library_string + r"(.*?)\.(.*?):(.*?/.*?)},?", newline).)

@thethirdghost
Copy link

Hi,
Thank you for sharing this wonderful script.

However, when I try this as you recommended, I got this error:

Traceback (most recent call last):
File "/Users/shoujuwang/Downloads/abc6d46d897b58a657699fa1a408573e-460e7e939172773fb60fb18e47b4291f75bfb70a/Papers3_to_Zotero.py", line 67, in
assert isinstance(result, re.Match), f"Unable to match regex expression >> \n{search_str} \nwith entry from BibTex >> \n{newline}"
AttributeError: module 're' has no attribute 'Match'

Here is the file I used
https://www.jianguoyun.com/p/DZ50VFgQyJfQBxij9NEC

Could you give me a hint to solve this error?

Many thanks.

@daeh
Copy link
Author

daeh commented Feb 2, 2020

module 're' has no attribute 'Match'

Hi @thethirdghost — What python version are you running?
From the same folder as the script file, what happens if you run this?

import re
print('this re module ' + re.match(r'.*(works).*', 'this is string works').group(1))
isinstance(re.match(r'.*(works).*', 'this is string works'), re.Match)

It should return

this re module works
True

It's possible this error has nothing to do with the script or your bibtex file. Something about how you're importing the regex module might be awry. You could dig around on google/SO. You might start by making sure there's not a file named re.py somewhere on the module search path as that can prevent the import of the real re module. You could also try simply moving the Papers3_to_Zotero.py file to a different location.

@thethirdghost
Copy link

thethirdghost commented Feb 3, 2020 via email

@daeh
Copy link
Author

daeh commented Feb 3, 2020

Good to know. I didn't go back and check compatibility with older pythons when I made those changes. I'll update the version requirement in the header.

@BlagoyBla
Copy link

Hi, I get the following error when I try to run the script:

File "/Users/Blagoy/Downloads/Papers to Zotero/Papers3_to_Zotero.py", line 68, in <module>
    assert isinstance(result, re.Match), f"Unable to match regex expression >> \n{search_str} \nwith entry from BibTex >> \n{newline}"
AssertionError: Unable to match regex expression >> 
^file = {.*?:/Users/Blagoy/Dropbox/Papers Library/Papers Library.papers3/(.*?)\.(.*?):(.*?/.*?)},? 
with entry from BibTex >> 
file = {GOLDEN-2001-Flexible_Work_Schedules_Which_Workers_Get_Them.pdf:/Users/Blagoy/Dropbox/Papers Library/Papers Library/GOLDEN/GOLDEN-2001-Flexible_Work_Schedules_Which_Workers_Get_Them.pdf:application/pdf}

Any idea how to resolve it?

@daeh
Copy link
Author

daeh commented Apr 3, 2020

Hi @BlagoyBla — What the error says is: The script is expecting the pdf files to be in /Users/Blagoy/Dropbox/Papers Library/Papers Library.papers3/, whereas your path is /Users/Blagoy/Dropbox/Papers Library/Papers Library/. I'm not sure what could cause the .papers3 extension to be present in the search path but absent in your .bib file. I'd like to first make sure that the path in your bibtex library matches where the files are actually located on your hard drive. Can you share your Papers BibTeX library file (or just the lines around GOLDEN-2001-Flexible_Work_Schedules_Which_Workers_Get_Them.pdf)? Also, what do you have for bibtex_library = and papers_library = ?

@BlagoyBla
Copy link

Hi @BlagoyBla — What the error says is: The script is expecting the pdf files to be in /Users/Blagoy/Dropbox/Papers Library/Papers Library.papers3/, whereas your path is /Users/Blagoy/Dropbox/Papers Library/Papers Library/. I'm not sure what could cause the .papers3 extension to be present in the search path but absent in your .bib file. I'd like to first make sure that the path in your bibtex library matches where the files are actually located on your hard drive. Can you share your Papers BibTeX library file (or just the lines around GOLDEN-2001-Flexible_Work_Schedules_Which_Workers_Get_Them.pdf)? Also, what do you have for bibtex_library = and papers_library = ?

Hi, my the library file is here: https://www.dropbox.com/s/qzl29srf4w3jqcp/Untitled.bib?dl=0

The lines in the script read like this:

bibtex_library = Path("~/Untitled.bib").expanduser() ### Path to Papers BibTeX library export
papers_library = Path("/Users/Blagoy/Dropbox/Papers Library/Papers Library.papers3").expanduser() ### Path to Papers3 Library

My Papers Library is stored in the Dropbox. To be honest, if I go to the folder in Finder, there is no .papers3 file there. So I added the extension. But this seems not to work. I am not sure where the .papers3 file is located at.

@daeh
Copy link
Author

daeh commented Apr 4, 2020

@BlagoyBla — yeah, every path in your Untitled.bib is /Users/Blagoy/Dropbox/Papers Library/Papers Library/ whereas your papers_library path is /Users/Blagoy/Dropbox/Papers Library/Papers Library.papers3/. Any idea how that's possible? Did you rename your library after exporting the bibtex or something?

I'd try things in this order:

First, verify that your library is really at /Users/Blagoy/Dropbox/Papers Library/Papers Library.papers3 (I presume it is because I have an assert statement to test for it). As a sanity check, you can run these two commands in your terminal:

open "/Users/Blagoy/Dropbox/Papers Library/Papers Library.papers3/GOLDEN" should open a folder.

open "/Users/Blagoy/Dropbox/Papers Library/Papers Library.papers3/GOLDEN/GOLDEN-2001-Flexible_Work_Schedules_Which_Workers_Get_Them.pdf" should open the pdf file.

If either of those doesn't work, your papers library isn't where we think it is.

If those commands work, open Papers and export the bibtex library again. If the file path on line 21 of the new .bib export reads file = {{GOLDEN-2001-Flexible_Work_Schedules_Which_Workers_Get_Them.pdf:/Users/Blagoy/Dropbox/Papers Library/Papers Library.papers3/GOLDEN/GOLDEN-2001-Flexible_Work_Schedules_Which_Workers_Get_Them.pdf:application/pdf}},, you're in good shape, go ahead and run the script with that bibtex file.

If it reads file = {{GOLDEN-2001-Flexible_Work_Schedules_Which_Workers_Get_Them.pdf:/Users/Blagoy/Dropbox/Papers Library/Papers Library/GOLDEN/GOLDEN-2001-Flexible_Work_Schedules_Which_Workers_Get_Them.pdf:application/pdf}},, there's something weird going on, but it shouldn't prevent the script from working.
Just batch replace /Users/Blagoy/Dropbox/Papers Library/Papers Library/ with /Users/Blagoy/Dropbox/Papers Library/Papers Library.papers3/ and run the script again (I went ahead and did this for you, so feel free to use the attached .bib file if that's easier).

@Pentastarch
Copy link

Daeh, thank you very much for the script - it works perfectly

@wuxmax
Copy link

wuxmax commented Sep 15, 2020

@BlagoyBla @daeh - I think the problem here is simply that that @BlagoyBlas Library path does not end with .papers3 and therefore he added it manually, which made it invalid obviously. It seems that the Papers Library path does not always contain the suffix (as did mine). If you just comment out lines 49 and 50, the script should work alright. Anyways thanks for the great work! Much appreciated 👍

@bri2020
Copy link

bri2020 commented May 5, 2022

Hi @daeh,
Does this import include also keywords - I had saved?

@Article{Williams:1957vt,
author = {Williams, G C},
title = {{Pleiotropy, Natural-Selection, and the Evolution of Senescence}},
journal = {Evolution; international journal of organic evolution},
year = {1957},
volume = {11},
number = {4},
pages = {398--411},
keywords = {paper1, senescence},
read = {Yes},
rating = {0},
date-added = {2021-07-21T09:34:22GMT},
date-modified = {2021-08-10T11:06:55GMT},
url = {http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=mekentosj&SrcApp=Papers&DestLinkType=FullRecord&DestApp=WOS&KeyUT=A1957XE15100002},
local-url = {file://localhost/Users/XXX/Documents/Library.papers3/Articles/1957/Williams/Evolution%201957%20Williams.pdf},
file = {{Evolution 1957 Williams.pdf:/Users/XXX/Documents/Library.papers3/Articles/1957/Williams/Evolution 1957 Williams.pdf:application/pdf;Evolution 1957 Williams.pdf:/Users/XXX/Documents/Library.papers3/Articles/1957/Williams/Evolution 1957 Williams.pdf:application/pdf}},
uri = {\url{papers3://publication/uuid/84C6B981-3CD0-4E97-B90A-4C12F971A965}}
}
In the bib it included... but does Zotera take it?

I have most of my papers organised in collections and above you describe how you would load in collections - but I am sure that some/many are not affiliated to something.

  • So my idea is that I will export all collections but what about the rest...
  • How can I can these in ?

Thanks a lot!
Bribri

@daeh
Copy link
Author

daeh commented May 5, 2022

@bri2020 if the keywords appear in the exported .bib, Zotero should load them. It seems to work fine for the example entry you posted (see screenshot)

For the items not in collections, just export the your full library (Export > BibTeX Library). If you follow the instructions, there shouldn't be any problem with having the same items appear in multiple .bib files. It won't produce duplicates as long as you export the collections as "Standard" so that the file path isn't included.
(See the instructions: First load in your whole library with the paths, then

Export each collection as a BibTex library ("Export" set to "Selected Collection" and "BibTex Record" set to "Standard"). This will prevent any file paths from being included in the *.bib file.

)

CleanShot 2022-05-05 at 11 47 29@2x

@cl-bu
Copy link

cl-bu commented Oct 6, 2022

Hello @daeh,

many thanks for uploading this Python script! Using Python for the very first time today (and knowing nothing about programming), I tried to run this script with the appropriate changes, by entering the file paths for papers_lib and bibtex_lib:

if papers_lib is None:
    papers_lib = "/Users/myusername/Materialien/Library.papers3/"
if bibtex_lib is None:
    bibtex_lib = "/Users/myusername/desktop/Library.bib/"

### Path to Papers3 Library ###
papers_library = Path("/Users/myusername/Materialien/Library.papers3/").expanduser()
### Path to the BibTeX export of the Papers3 Library ###
bibtex_library = Path("/Users/myusername/desktop/Library.bib/").expanduser()

When I run this thus-altered script in the Python 3.10.7 IDLE app for macOS (10.15.7 - I am using an 2012 MacBook Air on which Papers 3.4.25 still runs (quite) stable), the following error message occurs:

SyntaxError: multiple statements found while compiling a single statement

I tried to google what the problem might be, but I have to admit that I do not understand what I am supposed to do or change in order to prevent this error message.

I would appreciate if you were willing to help guide me as a total amateur through the process!

@daeh
Copy link
Author

daeh commented Oct 6, 2022

Hi @cl-bu — It has been a while since I wrote this so it's possible something about Papers3 has changed. I can take a quick look and see if there's a simple fix if you upload the Library.bib file. If you're up for it, you can also send me a zip of Library.papers3 via a filesharing service: daeda [at] mit.edu (depending on where the error is coming from I might not need the library, but it could help me figure out if there's a simple solution).

@cl-bu
Copy link

cl-bu commented Oct 6, 2022

Hi @daeh,
thank you - that is very kind! I will send you the .bib file in a couple of minutes, and am currently uploading my papers library to my university's file sharing service. The .zip file is roughly 60 gigabyte, so it will take some time before I can send you a link.

I hope that your script can help me solve my biggest headache regarding the transfer of my library from Papers 3 to Zotero. I have a rather large library (ca. 6,300 entries) with supplemental files attached to many of these entries. When I export my library to a bibtex file (set to complete record), two problems arise: (1) All supplemental files are listed as .pdf files (even though they are usually .docx files), and (2) the respective file path to these supplemental files omits the folder in which those files are actually stored (e.g., the correct file path for a supplemental file ought to be: /Users/myusername/Materialien/Library.papers3/authorname/supplemental/, but it always omits the "/supplemental" part of the path).

I am working in Chinese studies, therefore many library entries are in Chinese - not sure if this could cause problems when running the script.

Again, many thanks!

@daeh
Copy link
Author

daeh commented Oct 7, 2022

@cl-bu I updated the script to be more robust to your file names.

One of the reasons I wrote this script was that Papers3 was not exporting the paths to supplementary files, only the primary file. This script scans your Papers3 database and add all the supplementary files to the zotero_import.bib file that the script generates.

Are you saying that Papers3 is exporting the supplementary files path but is changing the filenames of the supplementary files? If so, that would be a new Papers3 behavior that I haven't seen. But I suspect that what you're seeing is the original behavior, and the script should work fine now that it can handle your filenames. If you get the current script off the gist, and then run

python Papers3_to_Zotero.py --papers "~/Materialien/Library.papers3" --bibtex "~/Desktop/Library.bib"

the zotero_import.bib file should include all of your supplementary files, including the .docx documents. Let me know if that doesn’t work.

@cl-bu
Copy link

cl-bu commented Oct 7, 2022

@daeh,
thank you for the updated script! I will let you know how it works.

In Papers3, if I select from the "File" drop down menu the command Export > BibTex Library, the resulting document does indeed only contain .pdf filenames (irrespective of their actual format, usually .docx, but on occasion also .txt. or .djvu). The exported file path to supplemental is invariably wrong, always lacking the last "/supplemental/" folder part.

Unfortunately, I am using the last and most recent version of Papers3 (3.4.25, Readcube does not support this app anymore; besides, you also cannot import .docx and other non-pdf files to their own Readcube Papers 4 app (I have been in contact with Readcube's support team)). I am not sure if I could find an older version of Papers3 online.

Within the package contents of the Papers3 app, there is a file called MTExporterBibTeX.nib (the file path is /Applications/Papers\ 3\ (Legacy).app/Contents/Resources/English.lproj/MTExporterBibTeX.nib), which I guess might be responsible for producing a .bib file of my Papers Library, but I neither know the file format nor do I know how to make changes to it.

@daeh
Copy link
Author

daeh commented Oct 7, 2022

@cl-bu One of the reasons I wrote this script was that Papers3 was not exporting the paths to supplementary files, only the primary file. This script scans your Papers3 database and add all the supplementary files to the zotero_import.bib file that the script generates.

I suspect this what you're seeing and the script should work fine now that it can handle your filenames. If you get the current script off the gist, and then run

python Papers3_to_Zotero.py --papers "~/Materialien/Library.papers3" --bibtex "~/Desktop/Library.bib"

the zotero_import.bib file should include all of your supplementary files, including the .docx documents. Let me know if that doesn’t work.

@cl-bu
Copy link

cl-bu commented Oct 8, 2022

@daeh, now everything worked perfectly - many thanks for updating the script! I had to tinker with the folder organization of my Papers3 library (I used to organize my files on basis of author name only, and therefore too many files without clear authorship (when no author was given, etc.) were collected in a folder called "Unknown" which messed up the python script's assignment of supplementary files, but after implementing a more specific folder organization scheme (organized on basis of authorship, editorship, source, and year) I am getting very good results. Thanks to you I am now finally able to migrate my Papers3 library to Zotero in a way that it remains usable for me.
Next, I will use your above-described method for exporting my Papers 3 collections (unfortunately quite a lot...).

@chandrachekuri
Copy link

This is 2024 but I still managed to export from my old Paper3 database/application and import to Zotero. I just wanted to add my note of thanks.

@daeh
Copy link
Author

daeh commented Apr 12, 2024

Wow. 5 years is a much longer lifespan for this than I expected. Glad it was useful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment