For a given KEGG pathway, we want to get a list of all the genes. Ensembl IDs are convenient here.
KEGG provides a REST API for some tasks, but is far from complete. For example, it is possible to map from KEGG to NCBI IDs, but not to Ensembl IDs.
The implementation peforms the following steps:
- GET pathway mapping: (e.g. http://rest.kegg.jp/link/genes/hsa04115)
path:hsa04115 hsa:1017
path:hsa04115 hsa:1019
path:hsa04115 hsa:1021
...
- For each gene ID, GET gene entry (e.g. http://rest.kegg.jp/get/hsa:54205)
...
DBLINKS NCBI-ProteinID: NP_001777
NCBI-GeneID: 983
OMIM: 116940
HGNC: 1722
HPRD: 00302
Ensembl: ENSG00000170312
Vega: OTTHUMG00000018290
UniProt: P06493 I6L9I5
...
For the lack of a better API, the data is extracted with regular expressions.