Skip to content

Instantly share code, notes, and snippets.

@jvansan
Created June 26, 2018 17:10
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jvansan/c05ce33c5e5aa593ed415d89dfc66dcf to your computer and use it in GitHub Desktop.
Save jvansan/c05ce33c5e5aa593ed415d89dfc66dcf to your computer and use it in GitHub Desktop.
DOI python regex
import re
def main():
tests = ["10.1021/cb3006787",
"10.1021/acschembio.5b00308",
"10.7164/antibiotics.45.1853",
"10.1016/S0953-7562(09)80401-2"
]
regexp = re.compile('^(10.\d{4,9})\/([-._;()/:A-Za-z0-9]+)$')
for t in tests:
res = re.match(regexp, t)
print('Overall match: %s' % res.group(0))
print('Group 1 match: %s' % res.group(1))
print('Group 2 match: %s\n' % res.group(2))
if __name__ == '__main__':
main()
@jvansan
Copy link
Author

jvansan commented Jun 26, 2018

Adapted from CrossRef.
Should capture ~99% of DOIs.
If in a pinch, '^(10.\d{4,9})\/)([^s]+)$' should be fairly robust, but may allow for false positives.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment