Skip to content

Instantly share code, notes, and snippets.

@codersquid
Created September 8, 2014 18:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save codersquid/0b82a529e0f4f9d63ab5 to your computer and use it in GitHub Desktop.
Save codersquid/0b82a529e0f4f9d63ab5 to your computer and use it in GitHub Desktop.
scan a file to parse out the dois
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import re
def match_doi(query):
""" match doi from query """
# pattern is from a very helpful SO question. I <3 SO
# http://stackoverflow.com/questions/27910/finding-a-doi-in-a-document-or-page
match = re.search(r'\b(10[.][0-9]{4,}(?:[.][0-9]+)*/(?:(?!["&\'<>])\S)+)\b', query)
if match is None:
return
result = match.group(0)
print(result)
return result
if __name__ == '__main__':
import fileinput
for line in fileinput.input():
match_doi(line)
@codersquid
Copy link
Author

If you have a pdf, run pdftotext <filename>, then ./matchdoi.py <filename>, which will print dois to stdout. I pasted them in to the zotero add-by-id GUI because I was lazy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment