Skip to content

Instantly share code, notes, and snippets.

@Graeme-Smith
Last active January 25, 2024 23:39
Show Gist options
  • Save Graeme-Smith/b6616a15ea4b3498ece3c447f38244c1 to your computer and use it in GitHub Desktop.
Save Graeme-Smith/b6616a15ea4b3498ece3c447f38244c1 to your computer and use it in GitHub Desktop.
Tutorial on Panel App API

Panel App API

Panel App has a RESTful API which allows programmatic access to it's data. The best place to start exploring the API is via the documentation at https://panelapp.genomicsengland.co.uk/api/docs/, which uses Swagger. Swagger is an open-source software framework that helps developers design, build, and document RESTful web services.

Swagger Documentation

The documentation is split into two parts:

  1. The first section describes the API endpoints for various different queries.
  2. The second section descibes the Models returned for each endpoint, i.e. how the returned JSON file is formatted.

The cool thing about Swagger documentation is that it is interactive. Select the endpoint you want to investigate, click the "Try it out" button, enter the query you want to test, and then click "execute". It will return the following:

Endpoints

Only GET endpoints are available to the user via the Panel App API, you can read data but not ammend it. The basic structure of all queries is:

  1. GET /items: This endpoint is used to fetch a list of all 'items'. The 'GET' method is used to read information.

  2. GET /items/{id}: This endpoint is used to fetch a specific 'item' based on its unique ID.

Where items would be something like 'panel', or 'gene', or 'region'.

Lets look at an example endpoint which returns data on the panels in Panel App:

https://panelapp.genomicsengland.co.uk/api/v1/panels/?page=1

RESTful APIs have human readable endpoints which are built in a logical way:

  • /v1/ - It is best practice to version control APIs, this ensures future changes to the API do not

  • https://panelapp.genomicsengland.co.uk/api/ - the base URL to access the APIbreak legacy code.

  • /panel/ - The endpoint for the data you would like to access

  • /?page=1 - The page of the returned data. Pagination in APIs is a technique for handling large sets of data by dividing the data into smaller, manageable chunks, or "pages".

Common Pitfall

To return all data from a paginated API, you'll typically have to write a loop that makes a request for each page and combines the results. If you do not do this then you may only be accessing the first "chunk" of data, rather than the complete data set.

Models

Models describe the nested structure of the JSON file, the keys used to identify values, and the data type of the values. This information Is useful when you need to seralize, or parse the JSON file.

Headers

The header returns various metadata about the request. HTTP status codes, which are also returned in the header of an HTTP response, provide a standardized way for servers to inform clients about the status of their request. The standard codes to expect when using this API are:

  • 200 Reuest successful
  • 400 Bad Request
  • 404 Not Found

Parsing JSON output

All data is returned as a JSON object. It is likely you will need to parse this JSON object to obtain the data you want.

bash

If you are parsing JSON on the command line ensure you have jq installed, if not install it:

sudo apt-get update
sudo apt-get install jq

jq is a utility for filtering JSON objects, the manual can be found at https://stedolan.github.io/jq/manual/

We can use curl to access the endpoint:

curl -X GET "https://panelapp.genomicsengland.co.uk/api/v1/genes/"

We will use jq to filter the JSON object returned from the gene endpoint, the command jq "." will pass all the data, but will add colour syntaxing to make the output more readable:

curl -X GET "https://panelapp.genomicsengland.co.uk/api/v1/genes/" | jq "."

In the following examples, we'll use the jq tool to filter and transform data from the API endpoint. NOTE that for many of these queries we are only getting the first page of the data:

  1. Extract gene names and confidence levels for each gene:
curl -s 'https://panelapp.genomicsengland.co.uk/api/v1/genes/' | jq -r '.results[] | .entity_name + ": " + .confidence_level'
  1. Extract gene names and associated panel names:
curl -s 'https://panelapp.genomicsengland.co.uk/api/v1/genes/' | jq -r '.results[] | .entity_name + ": " + .panel.name'
  1. Filter genes with a specific confidence level ("High" in this example):
curl -s 'https://panelapp.genomicsengland.co.uk/api/v1/genes/' | jq -r '.results[] | select(.confidence_level == "High") | .entity_name'` 
  1. Extract the total count of genes returned by the API:
curl -s 'https://panelapp.genomicsengland.co.uk/api/v1/genes/' | jq -r '.count'

jq is a very powerful tool and can perform complex transformations and queries on JSON data.

To download paginated data you will need to use a while loop to reiterate through the pages. This can be easier in Python where you can parse the data into a dataframe or database, and add error handling code. One way to do it in bash would be something like the approach below:

url="https://panelapp.genomicsengland.co.uk/api/v1/genes/?page=1"
while [[ "$url" != null ]]; do
	response=$(curl -s "$url")
	echo  "$response" | jq -r '.results[].entity_name'
	url=$(echo  "$response" | jq -r '.next')
	done

Python

Below are two simple functions to return all the data from two different end points, one which uses pagination, and on that does not.

import requests
import json
import pandas as pd

# Example function using a loop to load all pages for paginated endpoints 

def  get_panel_app_list():

"""
Queries the Panel App API to return details on all signed off Panels

:return: Pandas dataframe, Columns:id, hash_id, name, disease_group, disease_sub_group, status version, version_created, relevant_disorders, types, stats.number_of_genes, stats.number_of_strs, stats.number_of_regions

:rtype: pandas dataframe
"""
server = "https://panelapp.genomicsengland.co.uk"
ext = f"/api/v1/panels/signedoff/"

r = requests.get(server + ext, headers={"Content-Type": "application/json"})

# Send informative error message if bad request returned
if  not  r.ok:
	r.raise_for_status()
	sys.exit()

expected_panels = r.json()["count"]
# df columns: 'Name', 'DiseaseSubGroup', 'DiseaseGroup', 'CurrentVersion',
# 'CurrentCreated', 'Number_of_Genes', 'Number_of_STRs', 'Number_of_Regions',
# 'Panel_Id', 'Relevant_disorders', 'Status', 'PanelTypes'

GEL_panel_app_df = pd.json_normalize(r.json(), record_path=["results"])

# Reiterate over remaining pages of data
while  r.json()["next"] is  not  None:
	r = requests.get(
	r.json()["next"], headers={"Content-Type": "application/json"})
	GEL_panel_app_df = GEL_panel_app_df.append(pd.json_normalize(r.json(), record_path=["results"]))

return  GEL_panel_app_df

# Example function not using pagination

def  get_panel_app_genes(panel_id, panel_version, genome_build):
	server = "https://panelapp.genomicsengland.co.uk"
	ext = f"/api/v1/panels/{panel_id}/genes/?version={panel_version}"
	print(f"{server}{ext}")
	r = requests.get(server + ext, headers={"Content-Type": "application/json"})
	# Send informative error message if bad request returned
	if  not  r.ok:
		r.raise_for_status()
		sys.exit()
	decoded = r.json()
	gene_list = []
	for  entry  in  decoded.get("results"):
		gene_list.append(entry.get("gene_data").get("gene_symbol"))

  return  gene_list
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment