joshgel/jq_fhir.md

## jq_fhir.md

      
    Raw
  

              jq_fhir.md
            
          
    Working with FHIR from the command line using jq

I've recently started working with FHIR data. FHIR is a standardized JSON format for transmitting electronic health data.
jq is "a lightweight and flexible command-line JSON processor", which is the best tool I've been able to find for rapidly working with JSON data. Sure, there are lots of converters that allow you to convert FHIR JSON data to other formats, but for answering quick questions of the data, there probably isn't a better tool to help understand FHIR.
Unfortunately, after some googling, I haven't found much that describes how to use jq to manipulate FHIR data. So, this document will help me keep track of my findings and allow others to utilize what I have learned.
I will use this fake patient data to report results so that you can replicate my results exactly: https://github.com/sync-for-science/discovery-FHIR-data/blob/master/DSTU3/data/1396-Ledner.json
This gist won't make you an expert in either jq or FHIR, hopefully it provides some tools to help you answer questions of your FHIR data.
Update: I also recently found gron: https://github.com/tomnomnom/gron, which similarly seems powerful.
Update2: Great discussion on Hacker News about command line JSON tools, not specific to FHIR: https://news.ycombinator.com/item?id=25498364

The simplist jq use is just to view the data:
cat 1396-Ledner.json | jq '.'
This will pretty print all the resources allowing you to scan through it more quickly.
If we want to count the resources provided in this file, we can do:
cat 1396-Ledner.json | jq '.entry | length'
I get 317.
Let's look at a single resource:
cat 1396-Ledner.json | jq '.entry[217]'
This is an ambulatory encounter for an initial prenatal visit.
If you want a list of all the resources in this document:
cat 1396-Ledner.json | jq '.entry[].resource.resourceType'
But, this prints all 317 resource types on a new line, not super helpful.
So, let's group by and count:
cat 1396-Ledner.json | jq '.entry[].resource.resourceType' | sort | uniq -c | sort -nr
This gives me a list that looks like this:
 99 "Procedure"
 59 "Observation"
 48 "Claim"
 38 "ExplanationOfBenefit"
 38 "Encounter"
 10 "MedicationRequest"
 10 "Immunization"
  6 "Condition"
  2 "Practitioner"
  2 "Organization"
  2 "DiagnosticReport"
  2 "CarePlan"
  1 "Patient"

Quick side note, while I am looking at only a single patient FHIR Bundle here, this generalizes well across multiple patient bundles. From the same directory where we find 1396-Ledner.json, we can run this to count all the resources in the directory:
cat *.json | jq '.entry[].resource.resourceType' | sort | uniq -c | sort -nr
  4124 "Observation"
  1191 "Claim"
   998 "ExplanationOfBenefit"
   998 "Encounter"
   643 "Procedure"
   373 "Immunization"
   270 "DiagnosticReport"
   226 "Condition"
   201 "MedicationRequest"
    73 "CarePlan"
    54 "Practitioner"
    52 "Organization"
    29 "Patient"
    25 "Goal"
    20 "AllergyIntolerance"
     7 "MedicationDispense"
     6 "ImagingStudy"

Ok, lots of interesting and potentially useful information in there. Let's start to pull some of it out. Let's say that I'm interesting in digging into the Observations for our friend 1396-Ledner.json. What observations are available?
cat 1396-Ledner.json | jq '.entry[] | select(.resource.resourceType == "Observation") | .resource.code.text' | sort | uniq -c | sort -nr
Here we find an interesting list.
   6 "Tobacco smoking status NHIS"
   6 "Pain severity - 0-10 verbal numeric rating [Score] - Reported"
   6 "Body Weight"
   6 "Body Mass Index"
   6 "Body Height"
   6 "Blood Pressure"
   2 "Platelets [#/volume] in Blood by Automated count"
   2 "Platelet mean volume [Entitic volume] in Blood by Automated count"
   2 "Platelet distribution width [Entitic volume] in Blood by Automated count"
   2 "MCV [Entitic volume] by Automated count"
   2 "MCHC [Mass/volume] by Automated count"
   2 "MCH [Entitic mass] by Automated count"
   2 "Leukocytes [#/volume] in Blood by Automated count"
   2 "Hemoglobin [Mass/volume] in Blood"
   2 "Hematocrit [Volume Fraction] of Blood by Automated count"
   2 "Erythrocytes [#/volume] in Blood by Automated count"
   2 "Erythrocyte distribution width [Entitic volume] by Automated count"
   1 "Oral temperature"

We could also check the values and datetimes associated with each of these resources.
cat 1396-Ledner.json | jq '.entry[] | select(.resource.resourceType == "Observation") | {observation: .resource.code.text, value: .resource.valueQuantity.value, unit: .resource.valueQuantity.unit, date: .resource.effectiveDateTime},'
 {
   "observation": "Body Height",
   "value": 150.38799438456365,
   "unit": "cm",
   "date": "2009-04-07T21:38:49-07:00"
 }
 {
   "observation": "Pain severity - 0-10 verbal numeric rating [Score] - Reported",
   "value": 1.1614048676319295,
   "unit": "{score}",
   "date": "2009-04-07T21:38:49-07:00"
 }
 {
   "observation": "Body Weight",
   "value": 51.36044583837668,
   "unit": "kg",
   "date": "2009-04-07T21:38:49-07:00"
 }
 {
   "observation": "Body Mass Index",
   "value": 22.70923215015444,
   "unit": "kg/m2",
   "date": "2009-04-07T21:38:49-07:00"
 }
 {
   "observation": "Blood Pressure",
   "value": null,
   "unit": null,
   "date": "2009-04-07T21:38:49-07:00"
 }

 ... [this list continues at length]

As you can see, this works great, until "Blood Pressure" when we get null for value and unit. Digging into Blood Pressure, we see this is because the structure for values looks like this:
"component" : [
          {
            "code" : {
              "coding" : [
                {
                  "system" : "http://loinc.org",
                  "code" : "8462-4",
                  "display" : "Diastolic Blood Pressure"
                }
              ],
              "text" : "Diastolic Blood Pressure"
            },
            "valueQuantity" : {
              "value" : 74.05962699830609,
              "unit" : "mmHg",
              "system" : "http://unitsofmeasure.org",
              "code" : "mmHg"
            }
          },
          {
            "code" : {
              "coding" : [
                {
                  "system" : "http://loinc.org",
                  "code" : "8480-6",
                  "display" : "Systolic Blood Pressure"
                }
              ],
              "text" : "Systolic Blood Pressure"
            },
            "valueQuantity" : {
              "value" : 108.27160154939239,
              "unit" : "mmHg",
              "system" : "http://unitsofmeasure.org",
              "code" : "mmHg"
            }
          }
        ]

I would love a great way to deal with these null values better. Ideally, we'd conditionally handle all the blood pressure values so that they weren't null, but were instead replaced by the Systolic Blood Pressure / Diastolic Blood Pressure values. There might be a way, but it might also be getting pretty complex. Maybe I'll return to this, but for now I just want to be able to extract all the blood pressures:
cat 1396-Ledner.json | jq '.entry[] | select(.resource.resourceType == "Observation" and .resource.code.text == "Blood Pressure") | {observation: .resource.code.text, value: ((.resource.component[1].valueQuantity.value|round|tostring) + "/" + (.resource.component[0].valueQuantity.value|round|tostring)), unit: .resource.component[0].valueQuantity.unit, date: .resource.effectiveDateTime}'
Already decently complex. Note that I used the functions round and tostring which first round off the unnecesary decimals in the blood pressure and then convert it to a string so that it can be concat'ed together. Also, note the multiple layers of parentheses here; I got errors without them. Here is my result:
{
  "observation": "Blood Pressure",
  "value": "108/74",
  "unit": "mmHg",
  "date": "2009-04-07T21:38:49-07:00"
}
{
  "observation": "Blood Pressure",
  "value": "104/88",
  "unit": "mmHg",
  "date": "2010-04-13T21:38:49-07:00"
}
{
  "observation": "Blood Pressure",
  "value": "132/79",
  "unit": "mmHg",
  "date": "2011-04-19T21:38:49-07:00"
}  ... [list continues]

I will also note that I'm not sure that the dbp always comes before the sbp, so a better way might be to extract the components individually. This should be easy enough for the reader based on what we have seen so far.
That's it for now. Please let me know how we should expand this list. As I discover additional needs, I'll report them here.