Skip to content

Instantly share code, notes, and snippets.



Last active Apr 27, 2016
What would you like to do?
Get ENSEMBL IDs for a given KEGG pathway


For a given KEGG pathway, we want to get a list of all the genes. Ensembl IDs are convenient here.

KEGG provides a REST API for some tasks, but is far from complete. For example, it is possible to map from KEGG to NCBI IDs, but not to Ensembl IDs.

The implementation peforms the following steps:

  1. GET pathway mapping: (e.g.
path:hsa04115	hsa:1017
path:hsa04115	hsa:1019
path:hsa04115	hsa:1021
  1. For each gene ID, GET gene entry (e.g.
DBLINKS     NCBI-ProteinID: NP_001777
            NCBI-GeneID: 983
            OMIM: 116940
            HGNC: 1722
            HPRD: 00302
            Ensembl: ENSG00000170312
            Vega: OTTHUMG00000018290
            UniProt: P06493 I6L9I5

For the lack of a better API, the data is extracted with regular expressions.

from requests import get
from tqdm import tqdm
import re
def external_ids_for_kegg_pathway(pathway, organism = "hsa", external_db = "Ensembl", verbose = True):
header = ""
id_mapping = {}
# Get the list of genes
gene_list_request = get(header + "/link/genes/" + organism + pathway)
if gene_list_request.ok:
gene_list = re.findall("(" + organism + ":.*)", gene_list_request.text)
if verbose:
print("Will retrieve", len(gene_list), "entries...")
iterator = tqdm(gene_list) if verbose else gene_list
for gene_id in iterator:
# Get record for this gene_id
gene_entry_request = get(header + "/get/" + gene_id)
if gene_entry_request.ok:
external_id = re.findall(external_db + ": (.*)", gene_entry_request.text)
id_mapping[gene_id] = tuple(external_id)
return id_mapping
def test_ensembl_p53():
ensembl_ids = external_ids_for_kegg_pathway("04115")
assert len(ensembl_ids.keys()) == 69
assert ("ENSG00000141510",) in ensembl_ids.values()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.