Skip to content

Instantly share code, notes, and snippets.

@inodb
Last active December 22, 2016 17:09
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save inodb/2973be1ae255b4034b801f2c1b7cf781 to your computer and use it in GitHub Desktop.
Save inodb/2973be1ae255b4034b801f2c1b7cf781 to your computer and use it in GitHub Desktop.
Gene lengths in Jupyter notebook

Gene lenghts from ENSEMBL REST API

Use inside notebook (uses ! syntax):

import json
import pandas as pd

def get_gene_length(gene):
    gene_info = \
        !curl -s "http://grch37.rest.ensembl.org/xrefs/symbol/homo_sapiens/"{gene} -H 'Content-type:application/json' | \
            jq '.[0].id' | tr -d '"' | \
            xargs -I GENEID curl -s "http://grch37.rest.ensembl.org/lookup/id/"GENEID -H 'Content-type:application/json'
    d = json.loads(gene_info[0])
    return max(d['start'], d['end']) - min(d['start'], d['end'])

gene_lengths = pd.Series({gene: get_gene_length(gene) for gene in df.reset_index().GENE.unique()})
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment