Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Searching PubMed with Biopython
# This code uses Biopython to retrieve lists of articles from pubmed
# you need to install Biopython first.
# If you use Anaconda:
# conda install biopython
# If you use pip/venv:
# pip install biopython
# Full discussion:
# https://marcobonzanini.wordpress.com/2015/01/12/searching-pubmed-with-python/
from Bio import Entrez
def search(query):
Entrez.email = 'your.email@example.com'
handle = Entrez.esearch(db='pubmed',
sort='relevance',
retmax='20',
retmode='xml',
term=query)
results = Entrez.read(handle)
return results
def fetch_details(id_list):
ids = ','.join(id_list)
Entrez.email = 'your.email@example.com'
handle = Entrez.efetch(db='pubmed',
retmode='xml',
id=ids)
results = Entrez.read(handle)
return results
if __name__ == '__main__':
results = search('fever')
id_list = results['IdList']
papers = fetch_details(id_list)
for i, paper in enumerate(papers['PubmedArticle']):
print("{}) {}".format(i+1, paper['MedlineCitation']['Article']['ArticleTitle']))
@hongtao510
Copy link

hongtao510 commented May 4, 2017

Have you encountered the issue that the server will cut you off after querying ~2000 IDs?

@vyvy3n
Copy link

vyvy3n commented Jun 20, 2017

Should the Line 32, 33 be changed into:
for i, paper in enumerate(papers['PubmedArticle']): print("%d) %s" % (i+1, paper['MedlineCitation']['Article']['ArticleTitle']))

@nmagee
Copy link

nmagee commented Jun 24, 2017

@ThitherShore is correct - I can verify that suggested fix makes this gist functional.

@lemonysoft
Copy link

lemonysoft commented Aug 21, 2017

@hongtao510 we download 100 articles in an infinite loop. We see no blocking info since one week.

@ReddSpark1
Copy link

ReddSpark1 commented Sep 17, 2017

Please update code with ThitherShore's comment.

@reiaoki
Copy link

reiaoki commented Feb 23, 2018

Whenever there is a "+" sing in the content (for example ArticleTitle or AbstractText), it returns a string only after the "+" sign. Does anyone have a way to get around this?
e.g. The title "The Clinicopathological and Prognostic Implications of FoxP3+ Regulatory T Cells in Patients with Colorectal Cancer: A Meta-Analysis." will return "+ Regulatory T Cells in Patients with Colorectal Cancer: A Meta-Analysis"

@TomazAlexandre
Copy link

TomazAlexandre commented Apr 2, 2018

It looks like the format returned by the efetch method is slightly different now

If you replace papers with papers[‘PubmedArticle’] you should get the list or papers,

@makerspaze
Copy link

makerspaze commented Jun 4, 2018

@ThitherShore is correct we should use enumerate(papers['PubmedArticle']) instead of enumerate(papers)

@gunnarklee
Copy link

gunnarklee commented Dec 16, 2019

for line 36

print(json.dumps(papers['PubmedArticle'][0], indent=2, separators=(',', ':')))
instead of:
print(json.dumps(papers[0], indent=2, separators=(',', ':')))

@prashantkum7
Copy link

prashantkum7 commented Mar 20, 2020

I get abstracts and not the full text. Any reason why?

@canthonyscott
Copy link

canthonyscott commented Jul 14, 2020

I get abstracts and not the full text. Any reason why?

Pubmed does not contain full texts of papers. Abstracts only

@MLHafizur
Copy link

MLHafizur commented Oct 22, 2020

How may I save the result in CSV, with Title and Abstract columns?

@sidewinder02139
Copy link

sidewinder02139 commented Dec 3, 2020

Quick question: is there a extra ")" (or missing "(" in line 39?
print("%d) %s" % (i+1, paper['MedlineCitation']['Article']['ArticleTitle']))

#edited for formatting

@sidewinder02139
Copy link

sidewinder02139 commented Dec 3, 2020

How may I save the result in CSV, with Title and Abstract columns?

@MLZTazim - I'm in the same boat: learning how to use python to drive json to the result. Good fun!
A hint: https://www.geeksforgeeks.org/json-dumps-in-python/

@bonzanini
Copy link
Author

bonzanini commented Dec 4, 2020

Quick question: is there a extra ")" (or missing "(" in line 39?
print("%d) %s" % (i+1, paper['MedlineCitation']['Article']['ArticleTitle']))

#edited for formatting

@sidewinder02139 the syntax is correct: note the first ")" on that line is part of the output string

@sidewinder02139
Copy link

sidewinder02139 commented Dec 4, 2020

Quick question: is there a extra ")" (or missing "(" in line 39?
print("%d) %s" % (i+1, paper['MedlineCitation']['Article']['ArticleTitle']))
#edited for formatting

@sidewinder02139 the syntax is correct: note the first ")" on that line is part of the output string

DOH! LOL
Have a brilliant weekend and stay healthy!
btw, I love the code. Well done!

@jajkelle
Copy link

jajkelle commented Jan 21, 2021

ThitherShore is correct. Your code wont work until you enumerate papers['PubmedArticle']
Thank you for the example, but please change this soon so as not to confuse others.
I spent a while trying to figure out what was wrong.

While were at it the last line doesn't work either for the same reason, should be:
import json
print(json.dumps(papers['PubmedArticle'][0], indent=2, separators=(',', ':')))

@bonzanini
Copy link
Author

bonzanini commented Jan 25, 2021

ThitherShore is correct. Your code wont work until you enumerate papers['PubmedArticle']
Thank you for the example, but please change this soon so as not to confuse others.
I spent a while trying to figure out what was wrong.

While were at it the last line doesn't work either for the same reason, should be:
import json
print(json.dumps(papers['PubmedArticle'][0], indent=2, separators=(',', ':')))

@jajkelle Updated (better late than never), thank you all for pointing it out

@echorule
Copy link

echorule commented Dec 6, 2022

This is awesome thanks and works pretty well. I'm stuck on trying to get the details of the authors in a succinct way, can anybody help with how to do that? paper['MedlineCitation']['Article']['AuthorList'] isnt right....
Thanks in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment