Skip to content

Instantly share code, notes, and snippets.

@danielecook
Created May 14, 2013 19:59
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save danielecook/5578992 to your computer and use it in GitHub Desktop.
Save danielecook/5578992 to your computer and use it in GitHub Desktop.
This script takes a csv containing authors and associated Pubmed identifiers (PMIDs) of their publications and outputs a formatted html document of their publications. The first row of the csv should contain the authors, and each row below their publications (as PMIDs). If you put something other than a PMID in it will simply be outputted - so y…
"""
Daniel E. Cook 2013
(danielecook.com)
This script takes a csv containing authors and associated Pubmed identifiers (PMIDs) of their publications and outputs a formatted html document of their publications.
The first row of the csv should contain the authors, and each row below their publications (as PMIDs). If you put something other than a PMID in it will simply be outputted -
so you can add publications that might not be in pubmed or that you want to display in a certain way.
This script might be useful for individuals who maintains publication lists for researchers at a university, for instance.
Requires BioPython:
pip install biopython
The way the publications are displayed can be customized using CSS. This CSS can be used if desired:
/* pubs */
.pub_title {
font-weight: bold;
font-size: 13px;
margin: 0px;
}
.pub_authors {
color: #929292;
font-size: 11px;
margin: 0px;
}
.pub_info {
font-size: 11px;
}
.pub_info a {
padding-left: 3px;
padding-right: 3px;
}
"""
from Bio import Entrez
from Bio import Medline
import csv
import os
# Set your email here.
email = "Danielecook@gmail.com"
def f7(seq):
""" Removes non-unique items, stolen from stackoverflow (thanks stack overflow!) """
seen = set()
seen_add = seen.add
return [ x for x in seq if x not in seen and not seen_add(x)]
def csv_dict_array(f):
""" Convert CSV to array for each author """
f = csv.DictReader(open(f,'rU'),dialect='excel') # U = Universal New Line Dialect
# Generate per author dictionary
auth_dict = {}
for row in f:
for auth in row.keys():
# Set Default - initialize array; else append.
if row[auth] != '':
auth_dict.setdefault(auth,[]).append(row[auth])
# Remove duplicates
for i in auth_dict:
auth_dict[i] = list(f7(auth_dict[i]))
return auth_dict
def fetch_pub(pmid):
Entrez.email = email
recs = []
for k,v in enumerate(pmid):
print v
try:
""" Fetches pubmed data on publication using PMID """
handle = Entrez.efetch(db="pubmed",id=int(v),retmode="text",rettype="medline")
pub = Medline.parse(handle)
for p in pub:
pubmed_link = "<a class='pub_link' href='http://www.ncbi.nlm.nih.gov/pubmed/%s'>%s</a>" % (p['PMID'],p['PMID'])
if 'PMC' in p:
pubmed_link += " ( <a class='pmc_link' href='http://www.ncbi.nlm.nih.gov/pmc/articles/%s/'>Full Text</a> )" % (p['PMC'])
formatted = """
<div class='pub'>
<div class='pub_title'>%s</div>
<div class='pub_authors'>%s</div>
<div class='pub_date'>%s</div>
<div class='pub_journal_pages'>%s</div>
%s
</div>""" % (p['TI'],', '.join(p['AU']),p['DP'],p['SO'],pubmed_link)
recs.append(formatted.replace('\t','').strip())
except:
recs.append(v)
return recs
pubs = csv_dict_array("pubs.csv")
for auth,pub_list in pubs.items():
if not os.path.exists("pubs_formatted"):
os.makedirs("pubs_formatted")
f = open("pubs_formatted" + "/" + auth + ".txt",'w!a')
f.write('\n'.join(fetch_pub(pub_list)))
f.close()
@mabar1
Copy link

mabar1 commented Oct 14, 2022

Hi Daniel
I need to set up a publication list on a homepage with over 150 entries, so Im searching for a way to convert a downloaded .csv file from pubmed into html that I can simly paste on the text editor of the homepage. I think this is exactly what your script is doing? However I fail to run it in JupyterLab. After it should read (?) in the csv, I get the error:
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 6265: character maps to <undefined>
I think I cleaned the csv from weird characters such as öäüèéàá, but Im afraid I just dont understand the code. Any help would be greatly appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment