Skip to content

Instantly share code, notes, and snippets.

@tomsing1
Created November 22, 2018 19:04
Show Gist options
  • Save tomsing1/074e10905a89072144227f4670377d31 to your computer and use it in GitHub Desktop.
Save tomsing1/074e10905a89072144227f4670377d31 to your computer and use it in GitHub Desktop.
Notes on using NCBI eutils

All requests start with this URL: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/

  • Use lowercase characters for all parameters except &WebEnv.
  • Special characters, such as quotation marks (“) or the # symbol used in referring to a query key on the History server, should be represented by their URL encodings (%22 for ; %23 for #).
  • If a space is required, use a plus sign (+) instead of a space:
Incorrect: &id=352, 25125, 234
Correct:   &id=352,25125,234

Incorrect: &term=biomol mrna[properties] AND mouse[organism]
Correct:   &term=biomol+mrna[properties]+AND+mouse[organism]

Tools

EInfo (database statistics)

eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi

ESearch (text searches)

eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi

EPost (UID uploads)

eutils.ncbi.nlm.nih.gov/entrez/eutils/epost.fcgi

ESummary (document summary downloads)

eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi

EFetch (data record downloads)

eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi

ELink (Entrez links)

eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi

EGQuery (global query)

eutils.ncbi.nlm.nih.gov/entrez/eutils/egquery.fcgi

ESpell (spelling suggestions)

eutils.ncbi.nlm.nih.gov/entrez/eutils/espell.fcgi

ECitMatch (batch citation searching in PubMed)

eutils.ncbi.nlm.nih.gov/entrez/eutils/ecitmatch.cgi

Database identifiers

Entrez Database UID common name E-utility Database Name
BioProject BioProject ID bioproject
BioSample BioSample ID biosample
Biosystems BSID biosystems
Books Book ID books
Conserved Domains PSSM-ID cdd
dbGaP dbGaP ID gap
dbVar dbVar ID dbvar
Epigenomics Epigenomics ID epigenomics
EST GI number nucest
Gene Gene ID gene
Genome Genome ID genome
GEO Datasets GDS ID gds
GEO Profiles GEO ID geoprofiles
GSS GI number nucgss
HomoloGene HomoloGene ID homologene
MeSH MeSH ID mesh
NCBI C++ Toolkit Toolkit ID toolkit
NCBI Web Site Web Site ID ncbisearch
NLM Catalog NLM Catalog ID nlmcatalog
Nucleotide GI number nuccore
OMIA OMIA ID omia
PopSet PopSet ID popset
Probe Probe ID probe
Protein GI number protein
Protein Clusters Protein Cluster ID proteinclusters
PubChem BioAssay AID pcassay
PubChem Compound CID pccompound
PubChem Substance SID pcsubstance
PubMed PMID pubmed
PubMed Central PMCID pmc
SNP rs number snp
SRA SRA ID sra
Structure MMDB-ID structure
Taxonomy TaxID taxonomy
UniGene UniGene Cluster ID unigene
UniSTS STS ID unists

Examples

  • Information about a database in xml or json format
curl 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?db=sra'
curl 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?db=sra&retmode=json'
  • Search for all database identifiers associated with an SRA project
# get all SRA experiments associated with a Project
curl 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?retmax=10&db=sra&field=accn&term=SRP130961&retmode=json'
# get database identifier(s) for Biosamples (in the idlist field)
curl 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?retmax=10&db=biosample&field=accn&term=SAMN04969787&retmode=json'
# Geo
curl 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?retmax=10&db=gds&field=accession&term=GSM2934661&retmode=json&etype=etype'
curl 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?retmax=1000&db=gds&term=GSE109171&field=accn&retmode=json'
  • Retrieve summary for an identifier
# SRA experiments
curl 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?retmax=10&db=sra&id=4969778&retmode=json'
# biosample id retrieved above
curl 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?retmax=10&db=sra&id=4969787&retmode=json'
# GEO dataset retrieved above (includes information about platform and one? sample)
curl 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=gds&term=GSE[ETYP]+AND+"published+last+3+months"[Filter]&retmax=5000&usehistory=y'
  • Retrieve full record for an identifier (xml only)
# SRA experiment
curl 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?retmax=10&db=sra&id=4969778' | \
  xmllint --format -
curl 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?retmax=10&db=sra&id=4969787' | \
  xmllint --format -
# GEO dataset retrieved above (includes information about platform and one? sample)
curl 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?retmax=20&db=gds&id=200109171,100017021,302934661' | \
  xmllint --format -
curl 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?retmax=20&db=biosample&id=8362257' | \
  xmllint --format -
  • Links between databases (see this table for all available links)
curl 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?retmax=1000&db=biosample&dbfrom=gds&id=302934661&linkname=gds_biosample'
curl 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?retmax=1000&db=gds&dbfrom=biosample&id=200109171&linkname=biosample_gds'
curl 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?retmax=1000&db=biosample&dbfrom=sra&id=4969778&linkname=sra_biosample'

References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment