-
-
Save daeh/abc6d46d897b58a657699fa1a408573e to your computer and use it in GitHub Desktop.
#!/usr/bin/env python3 | |
# -*- coding: utf-8 -*- | |
"""Script to facilitate the import of a Readcube Papers 3 library into Zotero | |
__Purpose of this script__ | |
If you export your Readcube (Mekentosj) Papers3 library as a BibTeX file, the file paths to the PDFs are not formatted | |
correctly for Zotero to import them. | |
The specific issues include that: | |
* Papers3 does not export the file paths in a way that Zotero can understand. | |
* Papers3 does not export the paths to supplementary files, so only the primary PDF is imported into Zotero. | |
* Papers3 will export the primary PDF multiple times so you'll end up with multiple copies of the same PDF in Zotero. | |
* Papers3 includes superfluous supplementary files that you typically don't want to import into Zotero (e.g. *.html and | |
*.webarchive files). | |
This script will take the BibTeX file you exported from Papers3 and modify the file paths so that they can be imported into | |
Zotero. | |
__Usage__ | |
This script takes as input a BibTeX library exported from readcube/mekentosj Papers3 and outputs a BibTeX library for Zotero | |
to import. | |
The script preserves your Papers citekeys, adds supplementary files from the Papers3 Library, removes duplicate links to | |
PDFs, and removes extraneous *.html and *.webarchive files that are often created by importing articles into Paper from | |
a web browser. | |
__Instructions__ | |
* Make sure to have Better BibTeX pre-installed to Zotero if you want to preserve the Papers citekeys. | |
* Export your Papers3 library as a *.bib file. | |
Export > BibTeX Library | |
Make sure to set the "BibTex Record" option to "Complete". This will cause papers to include the paths to the main PDF | |
(or whatever) file in the *.bib export | |
* Run this script with python 3.7 or higher to generate the file, 'zotero_import.bib', in the same location as the BibTeX | |
library export. | |
* You can pass the script the paths to the Papers3 library and the BibTeX library export as command line arguments, | |
e.g.: | |
python Papers3_to_Zotero.py --papers "~/Documents/Library.papers3" --bibtex "~/Desktop/Library.bib" | |
* Or you can modify the script by updating the 'papers_lib_hardcoded' and 'bibtex_lib_hardcoded' variables with the | |
paths to your Papers3 library and the BibTeX library that you just exported. E.g.: | |
papers_lib_hardcoded = "~/Documents/User Library/Library.papers3" ### Path to Papers3 Library | |
bibtex_lib_hardcoded = "~/Desktop/full_library_export.bib" ### Path to Papers BibTeX library export | |
* Running the script will generate a new BibTeX file, 'zotero_import.bib', in the same location as the BibTeX library | |
export. | |
* Import the 'zotero_import.bib' file that gets generated with Zotero. | |
* Be sure to check the 'Import errors found:' file if Zotero generates one (if it exists, it will be in whatever folder you | |
imported the library to; sort by title to find it). | |
* Also check that special characters in titles and journal names were imported correctly. Sometimes '{\&}' in the | |
zotero_import.bib will be imported as '<span class="nocase">&</span>'. I'm not sure why or when this happens. You can | |
search for "</span>" to check. | |
__NOTE__ | |
The Collections groupings are not preserved with this method. This is one way to manually get your Papers3 Collections into | |
Zotero after following the above instructions: | |
* Export each collection as a BibTex library ("Export" set to "Selected Collection" and "BibTex Record" set to "Standard"). | |
This will prevent any file paths from being included in the *.bib file. | |
* Import that *.bib file directly to Zotero with the option to "Place imported collections and items into new collection" | |
selected. | |
* Then merge the duplicate records. That will give you a new collection with links to right papers from your Zotero library. | |
* In this strategy, you have to do that for each one of your Papers3 Collections. Not ideal but maybe tolerable. | |
__Author__ | |
Dae Houlihan | |
__Source__ | |
https://gist.github.com/daeh/abc6d46d897b58a657699fa1a408573e | |
""" | |
import argparse | |
import re | |
import sys | |
from pathlib import Path | |
from warnings import warn | |
def main(papers=None, bibtex=None): | |
################################################ | |
### Update these paths or pass via command line: | |
################################################ | |
### Path to Papers3 Library ### | |
papers_lib_hardcoded = "~/Documents/Library.papers3" | |
### Path to the BibTeX export of the Papers3 Library ### | |
bibtex_lib_hardcoded = "~/Desktop/library.bib" | |
################################################ | |
papers_lib = papers_lib_hardcoded if papers is None else papers | |
bibtex_lib = bibtex_lib_hardcoded if bibtex is None else bibtex | |
papers_library = Path(papers_lib).expanduser() | |
bibtex_library = Path(bibtex_lib).expanduser() | |
papers_library_string = str(papers_library).replace(r"(", r"\(").replace(r")", r"\)") + r"/" | |
if papers_library_string[-9:] != ".papers3/": | |
raise Exception( | |
f"The variable 'papers_library' should end in with '.papers3' but is rather: \n\t{str(papers_library)}" | |
) | |
if not papers_library.is_dir(): | |
raise Exception( | |
f"The path you provided to the Papers3 library does not seem to exist or is not a directory: \n\t{str(papers_library)}" | |
) | |
if not (bibtex_library.is_file() and bibtex_library.suffix == ".bib"): | |
raise Exception( | |
f"The path you provided to the BibTeX Library file you exported from Papers3 does not seem to exist or is not '.bib' file: \n\t{str(bibtex_library)}" | |
) | |
out, missing = list(), list() | |
with open(bibtex_library, "r") as btlib: | |
for line in btlib: | |
if line.startswith("file = {"): | |
templine = re.sub(r"^file = {{(.*?)}},?", r"file = {\1},", line, flags=re.M) | |
newline = re.sub(r"^file = {(.*?);(\1)},?", r"file = {\1},", templine, flags=re.M) | |
assert ";" not in newline # assert that this line references only one file | |
search_str = r"^file = {.*?:" + papers_library_string + r"(.*?\..*?):(.*?/.*?)},?" | |
filepath_relative = re.search(search_str, newline) | |
assert isinstance( | |
filepath_relative, re.Match | |
), f"Unable to match regex expression:: \n{search_str} \nwith entry from BibTex:: \n{newline}" | |
primary_file_path = papers_library / filepath_relative.group(1) | |
if not primary_file_path.is_file(): | |
warn(f"The linked file was not found: {primary_file_path}", UserWarning) | |
missing.append(primary_file_path) | |
supp_files = list() | |
for dir_extra in ["Supplemental", "Media"]: | |
supp_dir = primary_file_path.parents[0] / dir_extra | |
if supp_dir.exists(): | |
for x in supp_dir.iterdir(): | |
if ( | |
x.is_file() | |
and x.suffix not in [".html", ".webarchive"] | |
and str(x) != str(primary_file_path) | |
): | |
supp_files.append(x) | |
if len(supp_files) > 0: | |
search_str_supp = ( | |
r"(^file = {.*?:" + papers_library_string + r".*?\..*?:application/.*?)},?" | |
) | |
primary_line = re.search(search_str_supp, newline) | |
assert isinstance( | |
primary_line, re.Match | |
), f"Unable to match regex expression:: \n{search_str_supp} \nwith entry from BibTex:: \n{newline}" | |
newline = primary_line.group(1) | |
for x in supp_files: | |
print(f"adding supplementary file for {x.name}") | |
newline += f';{x.with_suffix("").name + " Supp" + x.suffix}:{x}:application/{x.suffix}' | |
newline += "},\n" | |
out.append(newline) | |
else: | |
out.append(line) | |
### New BibTeX record to import into Zotero | |
modified_lib = bibtex_library.parents[0] / "zotero_import.bib" | |
with open(modified_lib, "w", encoding="utf-8") as outfile: | |
for item in out: | |
outfile.write(item) | |
if missing: | |
print("\n\nList of missing files::\n") | |
for mf in missing: | |
print(mf) | |
print( | |
f"\n\nScript completed but {len(missing)} files referenced in the BibTeX library were not located. They are listed above." | |
) | |
else: | |
print( | |
f"\n\nScript appears to have completed successfully. You can now import this file into Zotero (make sure Better BibTeX is already installed): \n\t{str(modified_lib)}" | |
) | |
return 0 | |
def _cli(): | |
parser = argparse.ArgumentParser( | |
description=__doc__, formatter_class=argparse.ArgumentDefaultsHelpFormatter, argument_default=argparse.SUPPRESS | |
) | |
parser.add_argument("-p", "--papers", help="Path to Papers3 Library") | |
parser.add_argument("-b", "--bibtex", help="Path to the BibTeX export") | |
args = parser.parse_args() | |
return vars(args) | |
if __name__ == "__main__": | |
sys.exit(main(**_cli())) |
Hi @noclew —
The issue came from having parentheses in the path to your Papers library. I added a line so regex will escape the ()
in your path. Let me know if that doesn't resolve it.
Thanks!
Hi @rderoock — You're using the old script. Grab the latest version and you should be set. (Make sure the result = re.search()
line reads re.search(search_str, newline)
not re.search(r"^file = {.*?:" + papers_library_string + r"(.*?)\.(.*?):(.*?/.*?)},?", newline)
.)
Hi,
Thank you for sharing this wonderful script.
However, when I try this as you recommended, I got this error:
Traceback (most recent call last):
File "/Users/shoujuwang/Downloads/abc6d46d897b58a657699fa1a408573e-460e7e939172773fb60fb18e47b4291f75bfb70a/Papers3_to_Zotero.py", line 67, in
assert isinstance(result, re.Match), f"Unable to match regex expression >> \n{search_str} \nwith entry from BibTex >> \n{newline}"
AttributeError: module 're' has no attribute 'Match'
Here is the file I used
https://www.jianguoyun.com/p/DZ50VFgQyJfQBxij9NEC
Could you give me a hint to solve this error?
Many thanks.
module 're' has no attribute 'Match'
Hi @thethirdghost — What python version are you running?
From the same folder as the script file, what happens if you run this?
import re
print('this re module ' + re.match(r'.*(works).*', 'this is string works').group(1))
isinstance(re.match(r'.*(works).*', 'this is string works'), re.Match)
It should return
this re module works
True
It's possible this error has nothing to do with the script or your bibtex file. Something about how you're importing the regex module might be awry. You could dig around on google/SO. You might start by making sure there's not a file named re.py
somewhere on the module search path as that can prevent the import of the real re
module. You could also try simply moving the Papers3_to_Zotero.py
file to a different location.
Good to know. I didn't go back and check compatibility with older pythons when I made those changes. I'll update the version requirement in the header.
Hi, I get the following error when I try to run the script:
File "/Users/Blagoy/Downloads/Papers to Zotero/Papers3_to_Zotero.py", line 68, in <module>
assert isinstance(result, re.Match), f"Unable to match regex expression >> \n{search_str} \nwith entry from BibTex >> \n{newline}"
AssertionError: Unable to match regex expression >>
^file = {.*?:/Users/Blagoy/Dropbox/Papers Library/Papers Library.papers3/(.*?)\.(.*?):(.*?/.*?)},?
with entry from BibTex >>
file = {GOLDEN-2001-Flexible_Work_Schedules_Which_Workers_Get_Them.pdf:/Users/Blagoy/Dropbox/Papers Library/Papers Library/GOLDEN/GOLDEN-2001-Flexible_Work_Schedules_Which_Workers_Get_Them.pdf:application/pdf}
Any idea how to resolve it?
Hi @BlagoyBla — What the error says is: The script is expecting the pdf files to be in /Users/Blagoy/Dropbox/Papers Library/Papers Library.papers3/
, whereas your path is /Users/Blagoy/Dropbox/Papers Library/Papers Library/
. I'm not sure what could cause the .papers3
extension to be present in the search path but absent in your .bib
file. I'd like to first make sure that the path in your bibtex library matches where the files are actually located on your hard drive. Can you share your Papers BibTeX library file (or just the lines around GOLDEN-2001-Flexible_Work_Schedules_Which_Workers_Get_Them.pdf
)? Also, what do you have for bibtex_library =
and papers_library =
?
Hi @BlagoyBla — What the error says is: The script is expecting the pdf files to be in
/Users/Blagoy/Dropbox/Papers Library/Papers Library.papers3/
, whereas your path is/Users/Blagoy/Dropbox/Papers Library/Papers Library/
. I'm not sure what could cause the.papers3
extension to be present in the search path but absent in your.bib
file. I'd like to first make sure that the path in your bibtex library matches where the files are actually located on your hard drive. Can you share your Papers BibTeX library file (or just the lines aroundGOLDEN-2001-Flexible_Work_Schedules_Which_Workers_Get_Them.pdf
)? Also, what do you have forbibtex_library =
andpapers_library =
?
Hi, my the library file is here: https://www.dropbox.com/s/qzl29srf4w3jqcp/Untitled.bib?dl=0
The lines in the script read like this:
bibtex_library = Path("~/Untitled.bib").expanduser() ### Path to Papers BibTeX library export
papers_library = Path("/Users/Blagoy/Dropbox/Papers Library/Papers Library.papers3").expanduser() ### Path to Papers3 Library
My Papers Library is stored in the Dropbox. To be honest, if I go to the folder in Finder, there is no .papers3 file there. So I added the extension. But this seems not to work. I am not sure where the .papers3 file is located at.
@BlagoyBla — yeah, every path in your Untitled.bib
is /Users/Blagoy/Dropbox/Papers Library/Papers Library/
whereas your papers_library
path is /Users/Blagoy/Dropbox/Papers Library/Papers Library.papers3/
. Any idea how that's possible? Did you rename your library after exporting the bibtex or something?
I'd try things in this order:
First, verify that your library is really at /Users/Blagoy/Dropbox/Papers Library/Papers Library.papers3
(I presume it is because I have an assert statement to test for it). As a sanity check, you can run these two commands in your terminal:
open "/Users/Blagoy/Dropbox/Papers Library/Papers Library.papers3/GOLDEN"
should open a folder.
open "/Users/Blagoy/Dropbox/Papers Library/Papers Library.papers3/GOLDEN/GOLDEN-2001-Flexible_Work_Schedules_Which_Workers_Get_Them.pdf"
should open the pdf file.
If either of those doesn't work, your papers library isn't where we think it is.
If those commands work, open Papers and export the bibtex library again. If the file path on line 21 of the new .bib export reads file = {{GOLDEN-2001-Flexible_Work_Schedules_Which_Workers_Get_Them.pdf:/Users/Blagoy/Dropbox/Papers Library/Papers Library.papers3/GOLDEN/GOLDEN-2001-Flexible_Work_Schedules_Which_Workers_Get_Them.pdf:application/pdf}},
, you're in good shape, go ahead and run the script with that bibtex file.
If it reads file = {{GOLDEN-2001-Flexible_Work_Schedules_Which_Workers_Get_Them.pdf:/Users/Blagoy/Dropbox/Papers Library/Papers Library/GOLDEN/GOLDEN-2001-Flexible_Work_Schedules_Which_Workers_Get_Them.pdf:application/pdf}},
, there's something weird going on, but it shouldn't prevent the script from working.
Just batch replace /Users/Blagoy/Dropbox/Papers Library/Papers Library/
with /Users/Blagoy/Dropbox/Papers Library/Papers Library.papers3/
and run the script again (I went ahead and did this for you, so feel free to use the attached .bib file if that's easier).
Daeh, thank you very much for the script - it works perfectly
@BlagoyBla @daeh - I think the problem here is simply that that @BlagoyBlas Library path does not end with .papers3
and therefore he added it manually, which made it invalid obviously. It seems that the Papers Library path does not always contain the suffix (as did mine). If you just comment out lines 49 and 50, the script should work alright. Anyways thanks for the great work! Much appreciated 👍
Hi @daeh,
Does this import include also keywords - I had saved?
@Article{Williams:1957vt,
author = {Williams, G C},
title = {{Pleiotropy, Natural-Selection, and the Evolution of Senescence}},
journal = {Evolution; international journal of organic evolution},
year = {1957},
volume = {11},
number = {4},
pages = {398--411},
keywords = {paper1, senescence},
read = {Yes},
rating = {0},
date-added = {2021-07-21T09:34:22GMT},
date-modified = {2021-08-10T11:06:55GMT},
url = {http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=mekentosj&SrcApp=Papers&DestLinkType=FullRecord&DestApp=WOS&KeyUT=A1957XE15100002},
local-url = {file://localhost/Users/XXX/Documents/Library.papers3/Articles/1957/Williams/Evolution%201957%20Williams.pdf},
file = {{Evolution 1957 Williams.pdf:/Users/XXX/Documents/Library.papers3/Articles/1957/Williams/Evolution 1957 Williams.pdf:application/pdf;Evolution 1957 Williams.pdf:/Users/XXX/Documents/Library.papers3/Articles/1957/Williams/Evolution 1957 Williams.pdf:application/pdf}},
uri = {\url{papers3://publication/uuid/84C6B981-3CD0-4E97-B90A-4C12F971A965}}
}
In the bib it included... but does Zotera take it?
I have most of my papers organised in collections and above you describe how you would load in collections - but I am sure that some/many are not affiliated to something.
- So my idea is that I will export all collections but what about the rest...
- How can I can these in ?
Thanks a lot!
Bribri
@bri2020 if the keywords appear in the exported .bib
, Zotero should load them. It seems to work fine for the example entry you posted (see screenshot)
For the items not in collections, just export the your full library (Export > BibTeX Library
). If you follow the instructions, there shouldn't be any problem with having the same items appear in multiple .bib
files. It won't produce duplicates as long as you export the collections as "Standard" so that the file path isn't included.
(See the instructions: First load in your whole library with the paths, then
Export each collection as a BibTex library ("Export" set to "Selected Collection" and "BibTex Record" set to "Standard"). This will prevent any file paths from being included in the *.bib file.
)
Hello @daeh,
many thanks for uploading this Python script! Using Python for the very first time today (and knowing nothing about programming), I tried to run this script with the appropriate changes, by entering the file paths for papers_lib and bibtex_lib:
if papers_lib is None:
papers_lib = "/Users/myusername/Materialien/Library.papers3/"
if bibtex_lib is None:
bibtex_lib = "/Users/myusername/desktop/Library.bib/"
### Path to Papers3 Library ###
papers_library = Path("/Users/myusername/Materialien/Library.papers3/").expanduser()
### Path to the BibTeX export of the Papers3 Library ###
bibtex_library = Path("/Users/myusername/desktop/Library.bib/").expanduser()
When I run this thus-altered script in the Python 3.10.7 IDLE app for macOS (10.15.7 - I am using an 2012 MacBook Air on which Papers 3.4.25 still runs (quite) stable), the following error message occurs:
SyntaxError: multiple statements found while compiling a single statement
I tried to google what the problem might be, but I have to admit that I do not understand what I am supposed to do or change in order to prevent this error message.
I would appreciate if you were willing to help guide me as a total amateur through the process!
Hi @cl-bu — It has been a while since I wrote this so it's possible something about Papers3 has changed. I can take a quick look and see if there's a simple fix if you upload the Library.bib
file. If you're up for it, you can also send me a zip of Library.papers3
via a filesharing service: daeda [at] mit.edu (depending on where the error is coming from I might not need the library, but it could help me figure out if there's a simple solution).
Hi @daeh,
thank you - that is very kind! I will send you the .bib file in a couple of minutes, and am currently uploading my papers library to my university's file sharing service. The .zip file is roughly 60 gigabyte, so it will take some time before I can send you a link.
I hope that your script can help me solve my biggest headache regarding the transfer of my library from Papers 3 to Zotero. I have a rather large library (ca. 6,300 entries) with supplemental files attached to many of these entries. When I export my library to a bibtex file (set to complete record), two problems arise: (1) All supplemental files are listed as .pdf files (even though they are usually .docx files), and (2) the respective file path to these supplemental files omits the folder in which those files are actually stored (e.g., the correct file path for a supplemental file ought to be: /Users/myusername/Materialien/Library.papers3/authorname/supplemental/, but it always omits the "/supplemental" part of the path).
I am working in Chinese studies, therefore many library entries are in Chinese - not sure if this could cause problems when running the script.
Again, many thanks!
@cl-bu I updated the script to be more robust to your file names.
One of the reasons I wrote this script was that Papers3 was not exporting the paths to supplementary files, only the primary file. This script scans your Papers3 database and add all the supplementary files to the zotero_import.bib
file that the script generates.
Are you saying that Papers3 is exporting the supplementary files path but is changing the filenames of the supplementary files? If so, that would be a new Papers3 behavior that I haven't seen. But I suspect that what you're seeing is the original behavior, and the script should work fine now that it can handle your filenames. If you get the current script off the gist, and then run
python Papers3_to_Zotero.py --papers "~/Materialien/Library.papers3" --bibtex "~/Desktop/Library.bib"
the zotero_import.bib
file should include all of your supplementary files, including the .docx documents. Let me know if that doesn’t work.
@daeh,
thank you for the updated script! I will let you know how it works.
In Papers3, if I select from the "File" drop down menu the command Export > BibTex Library, the resulting document does indeed only contain .pdf filenames (irrespective of their actual format, usually .docx, but on occasion also .txt. or .djvu). The exported file path to supplemental is invariably wrong, always lacking the last "/supplemental/" folder part.
Unfortunately, I am using the last and most recent version of Papers3 (3.4.25, Readcube does not support this app anymore; besides, you also cannot import .docx and other non-pdf files to their own Readcube Papers 4 app (I have been in contact with Readcube's support team)). I am not sure if I could find an older version of Papers3 online.
Within the package contents of the Papers3 app, there is a file called MTExporterBibTeX.nib (the file path is /Applications/Papers\ 3\ (Legacy).app/Contents/Resources/English.lproj/MTExporterBibTeX.nib), which I guess might be responsible for producing a .bib file of my Papers Library, but I neither know the file format nor do I know how to make changes to it.
@cl-bu One of the reasons I wrote this script was that Papers3 was not exporting the paths to supplementary files, only the primary file. This script scans your Papers3 database and add all the supplementary files to the zotero_import.bib
file that the script generates.
I suspect this what you're seeing and the script should work fine now that it can handle your filenames. If you get the current script off the gist, and then run
python Papers3_to_Zotero.py --papers "~/Materialien/Library.papers3" --bibtex "~/Desktop/Library.bib"
the zotero_import.bib
file should include all of your supplementary files, including the .docx
documents. Let me know if that doesn’t work.
@daeh, now everything worked perfectly - many thanks for updating the script! I had to tinker with the folder organization of my Papers3 library (I used to organize my files on basis of author name only, and therefore too many files without clear authorship (when no author was given, etc.) were collected in a folder called "Unknown" which messed up the python script's assignment of supplementary files, but after implementing a more specific folder organization scheme (organized on basis of authorship, editorship, source, and year) I am getting very good results. Thanks to you I am now finally able to migrate my Papers3 library to Zotero in a way that it remains usable for me.
Next, I will use your above-described method for exporting my Papers 3 collections (unfortunately quite a lot...).
This is 2024 but I still managed to export from my old Paper3 database/application and import to Zotero. I just wanted to add my note of thanks.
Wow. 5 years is a much longer lifespan for this than I expected. Glad it was useful!
Hi,
I appreciate that you share the script that I was looking for!
However, when I tried what you recommended, I got an error at the line 66.
Here is the file that I used for the test:
https://www.dropbox.com/s/hz58zeipunbi5mn/papersToZotero.bib?dl=0
Could you please give me some insights why I am having an error?
Thanks!