Created
June 26, 2018 17:10
-
-
Save jvansan/c05ce33c5e5aa593ed415d89dfc66dcf to your computer and use it in GitHub Desktop.
DOI python regex
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import re | |
def main(): | |
tests = ["10.1021/cb3006787", | |
"10.1021/acschembio.5b00308", | |
"10.7164/antibiotics.45.1853", | |
"10.1016/S0953-7562(09)80401-2" | |
] | |
regexp = re.compile('^(10.\d{4,9})\/([-._;()/:A-Za-z0-9]+)$') | |
for t in tests: | |
res = re.match(regexp, t) | |
print('Overall match: %s' % res.group(0)) | |
print('Group 1 match: %s' % res.group(1)) | |
print('Group 2 match: %s\n' % res.group(2)) | |
if __name__ == '__main__': | |
main() |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Adapted from CrossRef.
Should capture ~99% of DOIs.
If in a pinch,
'^(10.\d{4,9})\/)([^s]+)$'
should be fairly robust, but may allow for false positives.