#!/usr/bin/env python | |
#-*- coding: utf-8 -*- | |
# Supporting Python 3 | |
import sys, os, re | |
try: bibtexdb = open(sys.argv[1]).read() | |
except: print("Error: specify the file to be processed!") | |
if not os.path.isfile('journalList.txt'): | |
import urllib | |
urllib.urlretrieve("https://raw.githubusercontent.com/JabRef/jabref/master/src/main/resources/journals/journalList.txt", | |
filename="journalList.txt") | |
rulesfile = open('journalList.txt') | |
for rule in rulesfile.readlines()[::-1]: ## reversed alphabetical order matches extended journal names first | |
pattern1, pattern2 = rule.strip().split(" = ") | |
if pattern1 != pattern1.upper() and (' ' in pattern1): ## avoid mere abbreviations | |
#bibtexdb = bibtexdb.replace(pattern1.strip(), pattern2.strip()) ## problem - this is case sensitive | |
repl = re.compile(re.escape(pattern1), re.IGNORECASE) ## this is more robust, although ca. 10x slower | |
(bibtexdb, num_subs) = repl.subn(pattern2, bibtexdb) | |
if num_subs > 0: | |
print "Replacing '%s' FOR '%s'" % (pattern1, pattern2) | |
with open('abbreviated.bib', 'w') as outfile: | |
outfile.write(bibtexdb) | |
print "Bibtex database with abbreviated files saved into 'abbreviated.bib'" |
You need to add: import re
Also, to be used in Python 3.0. your print statement should have ().
Thank you all.
Lines 23 and 27 still use print as a statement, but should use it like a function with parantheses.
Also have a look at this awesome resource:
https://pyformat.info/
I highliy recommend the "new" format syntax.
When I using this script in my windows laptop. It gives an error like this:
for rule in rulesfile.readlines()[::-1]: ## reversed alphabetical order matches extended journal names first
UnicodeDecodeError: 'gbk' codec can't decode byte 0x99 in position 7474: illegal multibyte sequence
So what happens ? Thanks.
It seems there is no https://raw.githubusercontent.com/JabRef/jabref/master/src/main/resources/journals/journalList.txt any more. I obtain
6b6b>./Abbreviate\ Journal\ Names\ in\ Bibtex\ Database.py bib.bib
Traceback (most recent call last):
File "./Abbreviate Journal Names in Bibtex Database.py", line 17, in
pattern1, pattern2 = rule.strip().split(" = ")
ValueError: need more than 1 value to unpack
@bergerrjf, you are right, the file has disappeared. I can look for a cached version in my computer, though.
That would be cool!
Is this the missing file?
https://github.com/JabRef/jabref/blob/master/src/main/resources/journals/journalList.csv
Yes it seems so! Though its in csv and not txt.
csv is a txt file. Nothing to worry about.
However, there are changes that make the script useless. I had to change ";" to " = ", remove the ";;" at the end of each line, and remove some odd lines with more than one "=" symbol. Besides that, the script works fine just for one thing: If the journal is also the title of a paper (What is systems biology? -> What is Syst. Biol.?)
I thank you, @FilipDominec for your wonderful work.
For anybody having trouble with the new script referenced by @glucksfall, I attach the good old journalList.txt
: https://gist.github.com/FilipDominec/6df14b3424e335c4a47a96640f7f0df9
This was of great help, thanks @FilipDominec !
I updated it to directly process the JabRef CSV and solve the issue mentioned by @glucksfall as well as another one I had, where a subset of the journal name was matched to another journal and then only part of it was abbreviated ("Journal of Medicine" being alphabetically before "New England Journal of Medicine").
I basically updated the regex to find and replace Journal = {pattern}
.
I am happy that you find this little script useful. Perhaps should I somehow sync all the improvements in at least 4 modified forks?
I found @trevismd's python3 version of code after I posted my code so I deleted mine.
Anyway, providing python3 code explicitly will be helpful for people who come up with this gist.
By the way, what is journalList.txt based on? A journal I try to submit to requires to follow ISSN List of Title Word Abbreviations so I want to know that.
It seems the list is based on JabRef.
@FilipDominec, feel free to reuse any part of my contribution.
Check out my latest revision: https://gist.github.com/peci1/4e67f3d0521ce014fc952bcca664b37d/revisions
The text "Replacing..." is only printed when there is at least one successful match for the pattern.