Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Searching PubMed with Biopython
# This code uses Biopython to retrieve lists of articles from pubmed
# you need to install Biopython first.
# If you use Anaconda:
# conda install biopython
# If you use pip/venv:
# pip install biopython
# Full discussion:
# https://marcobonzanini.wordpress.com/2015/01/12/searching-pubmed-with-python/
from Bio import Entrez
def search(query):
Entrez.email = 'your.email@example.com'
handle = Entrez.esearch(db='pubmed',
sort='relevance',
retmax='20',
retmode='xml',
term=query)
results = Entrez.read(handle)
return results
def fetch_details(id_list):
ids = ','.join(id_list)
Entrez.email = 'your.email@example.com'
handle = Entrez.efetch(db='pubmed',
retmode='xml',
id=ids)
results = Entrez.read(handle)
return results
if __name__ == '__main__':
results = search('fever')
id_list = results['IdList']
papers = fetch_details(id_list)
for i, paper in enumerate(papers['PubmedArticle']):
print("{}) {}".format(i+1, paper['MedlineCitation']['Article']['ArticleTitle']))
@hongtao510
Copy link

hongtao510 commented May 4, 2017

Have you encountered the issue that the server will cut you off after querying ~2000 IDs?

@vyvy3n
Copy link

vyvy3n commented Jun 20, 2017

Should the Line 32, 33 be changed into:
for i, paper in enumerate(papers['PubmedArticle']): print("%d) %s" % (i+1, paper['MedlineCitation']['Article']['ArticleTitle']))

@nmagee
Copy link

nmagee commented Jun 24, 2017

@ThitherShore is correct - I can verify that suggested fix makes this gist functional.

@lemonsoftltd
Copy link

lemonsoftltd commented Aug 21, 2017

@hongtao510 we download 100 articles in an infinite loop. We see no blocking info since one week.

@ReddSpark1
Copy link

ReddSpark1 commented Sep 17, 2017

Please update code with ThitherShore's comment.

@reiaoki
Copy link

reiaoki commented Feb 23, 2018

Whenever there is a "+" sing in the content (for example ArticleTitle or AbstractText), it returns a string only after the "+" sign. Does anyone have a way to get around this?
e.g. The title "The Clinicopathological and Prognostic Implications of FoxP3+ Regulatory T Cells in Patients with Colorectal Cancer: A Meta-Analysis." will return "+ Regulatory T Cells in Patients with Colorectal Cancer: A Meta-Analysis"

@TomazAlexandre
Copy link

TomazAlexandre commented Apr 2, 2018

It looks like the format returned by the efetch method is slightly different now

If you replace papers with papers[‘PubmedArticle’] you should get the list or papers,

@makerspaze
Copy link

makerspaze commented Jun 4, 2018

@ThitherShore is correct we should use enumerate(papers['PubmedArticle']) instead of enumerate(papers)

@gunnarklee
Copy link

gunnarklee commented Dec 16, 2019

for line 36

print(json.dumps(papers['PubmedArticle'][0], indent=2, separators=(',', ':')))
instead of:
print(json.dumps(papers[0], indent=2, separators=(',', ':')))

@prashantkum7
Copy link

prashantkum7 commented Mar 20, 2020

I get abstracts and not the full text. Any reason why?

@canthonyscott
Copy link

canthonyscott commented Jul 14, 2020

I get abstracts and not the full text. Any reason why?

Pubmed does not contain full texts of papers. Abstracts only

@MLHafizur
Copy link

MLHafizur commented Oct 22, 2020

How may I save the result in CSV, with Title and Abstract columns?

@sidewinder02139
Copy link

sidewinder02139 commented Dec 3, 2020

Quick question: is there a extra ")" (or missing "(" in line 39?
print("%d) %s" % (i+1, paper['MedlineCitation']['Article']['ArticleTitle']))

#edited for formatting

@sidewinder02139
Copy link

sidewinder02139 commented Dec 3, 2020

How may I save the result in CSV, with Title and Abstract columns?

@MLZTazim - I'm in the same boat: learning how to use python to drive json to the result. Good fun!
A hint: https://www.geeksforgeeks.org/json-dumps-in-python/

@bonzanini
Copy link
Author

bonzanini commented Dec 4, 2020

Quick question: is there a extra ")" (or missing "(" in line 39?
print("%d) %s" % (i+1, paper['MedlineCitation']['Article']['ArticleTitle']))

#edited for formatting

@sidewinder02139 the syntax is correct: note the first ")" on that line is part of the output string

@sidewinder02139
Copy link

sidewinder02139 commented Dec 4, 2020

Quick question: is there a extra ")" (or missing "(" in line 39?
print("%d) %s" % (i+1, paper['MedlineCitation']['Article']['ArticleTitle']))
#edited for formatting

@sidewinder02139 the syntax is correct: note the first ")" on that line is part of the output string

DOH! LOL
Have a brilliant weekend and stay healthy!
btw, I love the code. Well done!

@jajkelle
Copy link

jajkelle commented Jan 21, 2021

ThitherShore is correct. Your code wont work until you enumerate papers['PubmedArticle']
Thank you for the example, but please change this soon so as not to confuse others.
I spent a while trying to figure out what was wrong.

While were at it the last line doesn't work either for the same reason, should be:
import json
print(json.dumps(papers['PubmedArticle'][0], indent=2, separators=(',', ':')))

@bonzanini
Copy link
Author

bonzanini commented Jan 25, 2021

ThitherShore is correct. Your code wont work until you enumerate papers['PubmedArticle']
Thank you for the example, but please change this soon so as not to confuse others.
I spent a while trying to figure out what was wrong.

While were at it the last line doesn't work either for the same reason, should be:
import json
print(json.dumps(papers['PubmedArticle'][0], indent=2, separators=(',', ':')))

@jajkelle Updated (better late than never), thank you all for pointing it out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment