Skip to content

Instantly share code, notes, and snippets.

@mattions
Last active July 26, 2022 15:43
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mattions/5bc9fe656c6e5fdca3b2e62102c45d58 to your computer and use it in GitHub Desktop.
Save mattions/5bc9fe656c6e5fdca3b2e62102c45d58 to your computer and use it in GitHub Desktop.

FHIR2Metadata bridge

Retrieving the data from the INCLUDE FHIR Data via the FHIR_resource.

Quick help:

 python fhir_trisomy_bridge.py --help
usage: fhir_trisomy_bridge.py [-h] --cavatica_token CAVATICA_TOKEN --cavatica_project CAVATICA_PROJECT
                              --include_fhir_authentication_cookie INCLUDE_FHIR_AUTHENTICATION_COOKIE

Retrieve metadata from teh INCLUDE Server and updates the metadata on the Cavatica Project

optional arguments:
  -h, --help            show this help message and exit
  --cavatica_token CAVATICA_TOKEN
                        You can find your developer token at https://cavatica.sbgenomics.com/developer/token
  --cavatica_project CAVATICA_PROJECT
                        The Cavatica project where the files are already imported from the INCLUDE Portal
  --include_fhir_authentication_cookie INCLUDE_FHIR_AUTHENTICATION_COOKIE
                        The Authorization cookie from the INCLUDE FHIR Server (https://include-api-fhir-
                        service.includedcc.org/) To obtain the cookie, open the Chorme or Firefox console, go to the
                        Application tab and copy the value contained in `AWSELBAuthSessionCookie-0`.

General plan

Cavatica project to work from: https://cavatica.sbgenomics.com/u/mmattioni/include-htp-meta

Main idea is:

for all the files in project:

  • extract fhir_document_reference

This is how it looks like:

// 20220714112305
// https://include-api-fhir-service.includedcc.org/DocumentReference?identifier=HTP.0012022d-c855-4087-9dde-c522f0632024.kallisto.abundance.tsv.gz&_format=json

{
  "resourceType": "Bundle",
  "id": "17a64462-2c75-49ff-9045-018f5876fccd",
  "meta": {
    "lastUpdated": "2022-07-14T09:22:59.484+00:00"
  },
  "type": "searchset",
  "total": 1,
  "link": [
    {
      "relation": "self",
      "url": "https://include-api-fhir-service.includedcc.org/DocumentReference?_format=json&identifier=HTP.0012022d-c855-4087-9dde-c522f0632024.kallisto.abundance.tsv.gz"
    }
  ],
  "entry": [
    {
      "fullUrl": "https://include-api-fhir-service.includedcc.org/DocumentReference/317218",
      "resource": {
        "resourceType": "DocumentReference",
        "id": "317218",
        "meta": {
          "versionId": "2",
          "lastUpdated": "2022-06-27T20:13:06.978+00:00",
          "source": "#09XK9onJ1OBmDYxw",
          "profile": [
            "https://ncpi-fhir.github.io/ncpi-fhir-ig/StructureDefinition/ncpi-drs-document-reference"
          ],
          "tag": [
            {
              "system": "https://include.org/htp/fhir/researchstudy",
              "code": "HTP"
            }
          ]
        },
        "identifier": [
          {
            "use": "official",
            "system": "https://include.org/htp/fhir/documentreference",
            "value": "HTP.0012022d-c855-4087-9dde-c522f0632024.kallisto.abundance.tsv.gz"
          }
        ],
        "status": "current",
        "docStatus": "final",
        "type": {
          "coding": [
            {
              "system": "https://includedcc.org/fhir/code-systems/data_types",
              "version": "v1",
              "code": "Gene-Expression-Quantifications",
              "display": "Gene Expression Quantifications"
            }
          ],
          "text": "Gene Expression"
        },
        "category": [
          {
            "coding": [
              {
                "system": "https://includedcc.org/fhir/code-systems/experimental_strategies",
                "version": "v1",
                "code": "RNA-Seq",
                "display": "RNA-Seq"
              }
            ],
            "text": "RNA-Seq"
          },
          {
            "coding": [
              {
                "system": "https://includedcc.org/fhir/code-systems/data_categories",
                "version": "v1",
                "code": "Transcriptomic",
                "display": "Transcriptomic"
              }
            ],
            "text": "Transcriptomic"
          }
        ],
        "subject": {
          "reference": "Patient/4941"
        },
        "securityLabel": [
          {
            "coding": [
              {
                "system": "https://includedcc.org/fhir/code-systems/data_access_types",
                "version": "v1",
                "code": "registered",
                "display": "Registered"
              }
            ]
          },
          {
            "text": "*"
          }
        ],
        "content": [
          {
            "attachment": {
              "extension": [
                {
                  "url": "https://nih-ncpi.github.io/ncpi-fhir-ig/StructureDefinition/file-size",
                  "valueDecimal": 2743960
                },
                {
                  "url": "https://nih-ncpi.github.io/ncpi-fhir-ig/StructureDefinition/hashes",
                  "valueCodeableConcept": {
                    "coding": [
                      {
                        "display": "md5"
                      }
                    ],
                    "text": "69b98f149df72d8de23a6a1333a43515"
                  }
                }
              ],
              "url": "drs://data.kidsfirstdrc.org/733f0956-dac4-4ad9-b787-4e186c6530b0",
              "title": "0012022d-c855-4087-9dde-c522f0632024.kallisto.abundance.tsv.gz"
            },
            "format": {
              "display": "tsv"
            }
          }
        ],
        "context": {
          "related": [
            {
              "reference": "Specimen/297344"
            }
          ]
        }
      },
      "search": {
        "mode": "match"
      }
    }
  ]
}

entry --> subject gives me the patient

There could be an option to reverse chanin, but it does not work nicely, so the idea is:

Search for condition == MONDO:000860 and verification-status-confirmed and subject with the Patient Id

https://include-api-fhir-service.includedcc.org/Condition?code=MONDO:0008608&verification-status=confirmed&subject=Patient/4927

if nothing found, we get search list equal to zero

// 20220714114620
// https://include-api-fhir-service.includedcc.org/Condition?code=MONDO%3A0008608&subject=Patient%2F4927&verification-status=confirmed&_format=json

{
  "resourceType": "Bundle",
  "id": "2811c99d-d2f8-488c-9ac9-3d605180e0c9",
  "meta": {
    "lastUpdated": "2022-07-14T09:46:20.427+00:00"
  },
  "type": "searchset",
  "total": 0,
  "link": [
    {
      "relation": "self",
      "url": "https://include-api-fhir-service.includedcc.org/Condition?_format=json&code=MONDO%3A0008608&subject=Patient%2F4927&verification-status=confirmed"
    }
  ]
}

which means we can search the total and if 0 set the sample_type to DS21

if instead the Patient has Trisomy

instead if we have trosomy this is what we obtained:

https://include-api-fhir-service.includedcc.org/Condition?code=MONDO:0008608&verification-status=confirmed&subject=Patient/4929

// 20220714115105
// https://include-api-fhir-service.includedcc.org/Condition?code=MONDO%3A0008608&subject=Patient%2F4929&verification-status=confirmed&_format=json

{
  "resourceType": "Bundle",
  "id": "c5ad06d5-fc1e-4133-9270-38c1af212d28",
  "meta": {
    "lastUpdated": "2022-07-14T09:50:50.668+00:00"
  },
  "type": "searchset",
  "total": 1,
  "link": [
    {
      "relation": "self",
      "url": "https://include-api-fhir-service.includedcc.org/Condition?_format=json&code=MONDO%3A0008608&subject=Patient%2F4929&verification-status=confirmed"
    }
  ],
  "entry": [
    {
      "fullUrl": "https://include-api-fhir-service.includedcc.org/Condition/9089",
      "resource": {
        "resourceType": "Condition",
        "id": "9089",
        "meta": {
          "versionId": "1",
          "lastUpdated": "2022-03-11T01:36:57.396+00:00",
          "source": "#DFc1zpVt8Cb0oeZd",
          "tag": [
            {
              "system": "https://include.org/htp/fhir/researchstudy",
              "code": "HTP"
            }
          ]
        },
        "identifier": [
          {
            "use": "official",
            "system": "https://include.org/htp/fhir/condition",
            "value": "HTP0005.MONDO:0008608"
          }
        ],
        "verificationStatus": {
          "coding": [
            {
              "system": "http://terminology.hl7.org/CodeSystem/condition-ver-status",
              "version": "v1",
              "code": "confirmed",
              "display": "Confirmed"
            }
          ]
        },
        "category": [
          {
            "coding": [
              {
                "system": "http://terminology.hl7.org/CodeSystem/condition-category",
                "code": "encounter-diagnosis",
                "display": "Encounter Diagnosis"
              }
            ]
          }
        ],
        "code": {
          "coding": [
            {
              "system": "http://purl.obolibrary.org/obo/mondo.owl",
              "version": "v1",
              "code": "MONDO:0008608",
              "display": "Down Syndrome"
            },
            {
              "system": "https://nih-ncpi.github.io/ncpi-fhir-ig/data-dictionary/HTP/ds_condition",
              "code": "OfficialDSDiagnosis",
              "display": "Typically self/parent report."
            }
          ],
          "text": "Complete trisomy 21"
        },
        "subject": {
          "reference": "Patient/4929"
        }
      },
      "search": {
        "mode": "match"
      }
    }
  ]
}

if total is instead 1, we can set it to T21

import argparse
import datetime
import requests
import sevenbridges as sbg
from sevenbridges.http.error_handlers import rate_limit_sleeper, maintenance_sleeper
class FHIR2Metadata:
def __init__(self, cavatica_token, cavatica_project_id, fhir_auth_cookie):
self.cavatica_token = cavatica_token
self.cavatica_project = cavatica_project_id
self.fhir_auth_cookie = fhir_auth_cookie
self.CAVATICA_API_URL="https://cavatica-api.sbgenomics.com/v2"
self.INCLUDE_FHIR= "https://include-api-fhir-service.includedcc.org/"
def main(self):
"""Retrieve the metadata from the INCLUDE FHIR Server"""
api = sbg.Api(url=self.CAVATICA_API_URL, token=self.cavatica_token, advance_access=True,
error_handlers=[rate_limit_sleeper, maintenance_sleeper])
print("Retrieving data from Cavatica")
files = api.files.query(project=self.cavatica_project).all()
start = datetime.datetime.now()
print(f"Starting this at: {start}")
for fh in files:
print(f"Working on file: {fh}")
document_reference_url = fh.metadata['fhir_document_reference']
fh.metadata['sample_type'] = self.get_trisomy_state(document_reference_url)
fh.metadata['case_id'] = self.get_case_id(document_reference_url)
fh.save()
stop = datetime.datetime.now()
delta = stop - start
print(f"Stop time: {stop}. Time taken: {delta}")
print("Metadata imported")
def get_case_id(self, document_reference_url):
req = requests.get(document_reference_url, cookies = {"AWSELBAuthSessionCookie-0" : self.fhir_auth_cookie})
req_j = req.json()
patient_number = req_j['entry'][0]['resource']['subject']['reference']
query = f"{self.INCLUDE_FHIR}{patient_number}"
req = requests.get(query, cookies = {"AWSELBAuthSessionCookie-0" : self.fhir_auth_cookie})
req_j = req.json()
case_id = ""
for identifier in req_j['identifier']:
if identifier['use'] == "official":
case_id = identifier['value']
break
return case_id
def get_trisomy_state(self, document_reference_url):
req = requests.get(document_reference_url, cookies = {"AWSELBAuthSessionCookie-0" : self.fhir_auth_cookie})
req_j = req.json()
patient_number = req_j['entry'][0]['resource']['subject']['reference']
# Check if trosomy is present and confirmed
# Condition --> `MONDO:000860`
# verification-status=confirmed
# patient/<number>
#
# all url escaped looks like:
# Condition?code=MONDO%3A0008608&subject=Patient%2F4927&verification-status=confirmed&_format=json
query = f"{self.INCLUDE_FHIR}Condition?code=MONDO:0008608&verification-status=confirmed&subject={patient_number}&_format=json"
print(query)
req = requests.get(query, cookies = {"AWSELBAuthSessionCookie-0" : self.fhir_auth_cookie})
req_j = req.json()
total = req_j['total']
trisomy_state = ""
if total == 0:
trisomy_state = "D21"
elif total == 1:
trisomy_state = "T21"
return trisomy_state
if __name__ == "__main__":
parser = argparse.ArgumentParser(description='Retrieve metadata from teh INCLUDE Server and updates the metadata on the Cavatica Project')
parser.add_argument("--cavatica_token", required=True, help="You can find your developer token at https://cavatica.sbgenomics.com/developer/token")
parser.add_argument("--cavatica_project", required=True, help="The Cavatica project where the files are already imported from the INCLUDE Portal")
parser.add_argument("--include_fhir_authentication_cookie", required=True,
help="The Authorization cookie from the INCLUDE FHIR Server (https://include-api-fhir-service.includedcc.org/) \
To obtain the cookie, open the Chorme or Firefox console, go to the Application tab and copy the value \
contained in `AWSELBAuthSessionCookie-0`.")
args = parser.parse_args()
fhir2meta = FHIR2Metadata(args.cavatica_token, args.cavatica_project, args.include_fhir_authentication_cookie)
fhir2meta.main()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment