Skip to content

Instantly share code, notes, and snippets.

@mattwiller
Created May 16, 2023 18:58
Show Gist options
  • Save mattwiller/7e39f09561f5b6771ca3aac98a411ec8 to your computer and use it in GitHub Desktop.
Save mattwiller/7e39f09561f5b6771ca3aac98a411ec8 to your computer and use it in GitHub Desktop.
Profile Validation Design Doc

Overview

Profiles allow extensive customization of the FHIR resource validation process to ensure that different resources contain the data necessary to satisfy various use cases. They're defined using the same StructureDefinition resource as the base resource types, but profiles often use features/fields not found in the base types. Adding robust profile validation functionality to the FHIR server will enable customers to use existing standardized profiles to conform to industry best practices, as well as to create their own profiles to ensure that data written meets their application's required schema.

Key customer use cases driving this project include:

  • Adding US Core profiles to check a compliance box
  • Using custom profiles to enforce their own application schema
  • Building a library of integration-focused profiles for common vendors

Goals

  • Support all profile functionality used in well-known profiles, including US Core and Lab Services IG
  • Allow customers to create and maintain their own profiles and use them for resource validation in their Project
  • Provide a painless upgrade path for customers in both strict and non-strict validation modes
  • Keep resource validation performant and efficient even when 1-3 profiles are in use

Background: FHIR profiles

The use of profiles to constrain resources is a cornerstone of the FHIR ecosystem, and has several key features that work together to give profile authors powerful capabilities for defining the required "shape" of a resource for specific use cases. Profiles affect the validation rules for a resource in a variety of ways, with the critical caveat that "the constraining profile can only allow what the base profile allows". A profile can only make validation rules more restrictive. The most common ways that profiles specialize resources are:

  1. Restricting the cardinality of fields, e.g. to make a field mandatory or forbidden
  2. Requiring the use of a specific terminology in a field, e.g. to mandate the use of LOINC codes in a CodeableConcept
  3. Fixing the expected value of a field, e.g. to require that a quantity uses certain units
  4. Use slices to assert that a field containing multiple values must have some values satisfying certain requirements, e.g. to require that a measurement contain both systolic and diastolic blood pressure

Within the StructureDefinition, the profile is defined by one or both of two top-level fields: differential and snapshot. Relative to the base profile being specialized (from the baseDefinition field), the differential field contains only the changes in the current profile from its base profile. The snapshot field contains the full flattened resource type definition, with all base profiles included. In general, it is most useful to work with profiles that include a correct snapshot, since the resource can be validated in one pass without needing to resolve the base profile(s).

Both of these fields contain a list of ElementDefinition items corresponding to the fields of the described resource type. The fields are listed in depth-first traversal order, so a parent field is always listed before its children, and all descendants of a field are listed before its next sibling field. As an example, these are the path values from the first 25 items in the Patient definition snapshot in order:

Patient
Patient.id
Patient.meta
Patient.implicitRules
Patient.language
Patient.text
Patient.contained
Patient.extension
Patient.modifierExtension
Patient.identifier
Patient.active
Patient.name
Patient.telecom
Patient.gender
Patient.birthDate
Patient.deceased[x]
Patient.address
Patient.maritalStatus
Patient.multipleBirth[x]
Patient.photo
Patient.contact
Patient.contact.id
Patient.contact.extension
Patient.contact.modifierExtension
Patient.contact.relationship

A resource can claim that it conforms to a profile by adding the canonical URL of the profile to its meta.profile field, e.g.

{
	"resourceType": "Patient",
	"meta": {
		"profile": ["http://hl7.org/fhir/us/core/StructureDefinition/us-core-patient"]
	}
}

Note that this is only a claim: the resource must still be validated to ensure it actually satisfies the profile. A resource may also satisfy a profile without claiming so in meta.profile.

Current state

Validation logic currently resides in the @medplum/core package, and can already use a StructureDefinition loaded from the base specification to validate the properties of a JSON resource. We use this functionality in the Repository from @medplum/server to validate resources before they are written. Today, this code performs the following validations:

  • Check that the resource contains a resourceType property (not actually a resource field, but required in FHIR JSON)
  • Ensure that null does not appear anywhere in the JSON
    • NOTE: This behavior is incorrect: valid primitive extensions may contain null values:

      [W]hen one of the repeating elements has no value, it is represented in the first array using a null. When an element has a value but no extension/id, the second array will have a null at the position of that element.

  • Check each field's cardinality, including required fields (min > 0) and array fields (\max > 1)
    • NOTE: This currently only checks for the above conditions, and does not validate the actual cardinality requirement
  • Check that the data types of fields are correct
  • Ensures that no extraneous fields are present on the resource

Some of these validation checks currently make improper assumptions based on what appears in the base specification. Profiles use a wider set of features from the StructureDefinition, and do not in general follow the same patterns as the base specification. Based on the survey of current validation functionality above, the following gaps exist in validation as implemented at present:

  • Field cardinality should be strictly checked; it currently only checks if a field is required or an array
  • Fields with fixed multiple cardinality are not well-supported, e.g. a field with cardinality 0..2 will not be handled correctly
  • If a profile restricts the cardinality of a field from e.g. 0..* to 1..1, the field must still be formatted in JSON as an array. The code should look at not just the min and max values in the ElementDefinition, but also take into account base.min and base.max
  • The regex hints for FHIR data types represented as JSON strings are used directly, but these are explicitly only informational, and are insufficient to fully validate more complex types like dateTime or uuid
    • Any regex used must also be fenced with ^ and $ to explicitly ensure the whole input string matches the expected regex. Otherwise, as it functions today, a "date" like "2023-05-15 lol jk this isn't a date" will be accepted and written to the database

Proposed design

When a resource is written to the FHIR server, the server should inspect its meta.profile field to identify any profiles the resource claims to satisfy. For each URL in the field, the server should:

  1. Retrieve the referenced profile, matching against the StructureDefinition:url search parameter
  2. Validate the resource against each retrieved profile
  3. If successful, write the resource; otherwise, return a 400 error to the client

The steps to realize this concept are detailed below.

Close validation gaps and refactor

First, we should take a pass over the existing validation code and close the gaps listed above, since many of these gaps will be directly exposed by common profiles. We should also perform any necessary targeted refactoring up front to ensure the code is structured correctly for profile validation: for example, the validation logic should accept a passed-in StructureDefinition instead of requiring it to be loaded into the global schema cache.

We will need to implement two key features of profile validation that are currently missing from our validation logic: fixed values and slicing. These are used pervasively in common profiles to specialize resource types, and require reading new fields from the StructureDefinition resource. Fixed values are straightforward, and specify that a field must contain exactly some value. Slicing, on the other hand, is more flexible and complex: it enables specifying patterns that some but not necessarily all elements in an array field must satisfy. This feature is discussed in more detail in Appendix A. One these features are implemented, we will have covered approximately the most common 80% of profile validation functionality. At this point, we would be able to validate specific profiles the server had loaded into memory — this would include the base specification but could also include support for common profiles like US Core.

Enable custom profiles

Customers should also be able to use their own profiles, so the system will need to be able to load arbitrary StructureDefinition resources when performing validation. As mentioned above, these are referenced by their canonical URL: the server should search for StructureDefinition resources where the url field matches the URLs present in the resource's meta.profile field. The pseudocode for the validation routine of a given resource could be something like this:

profileURLs = resource.meta.profiles
profiles = await repo.search({
	resourceType: 'StructureDefinition',
	filters: [{ code: 'url', operator: Operator.EQUALS, value: profileURLs.join(',') }]
})
for profile in profiles:
	validateResource(resource, profile) // throws if invalid
writeResource(resource)

NOTE: Validating multiple profiles via their snapshot field requires duplicative work, since the rules from the base definition are applied again for each profile. In theory one could validate the base resource profile and then only check the differential fields of profiles that extend it, but this would also require storing and retrieving the full chain of linked StructureDefinitions from the target profile back to the base.

Using the raw StructureDefinition resources for validation may be prohibitively expensive due to complexity in reading from the linearized field list in snapshot. Instead, we can compile and possibly store an indexed form of the profile definition that makes it easier to retrieve the values we need for validation. This indexing appears to be reasonably efficient already: indexing the entire profiles-resources.json Bundle from the FHIR spec takes ~3 ms, yielding an average of under 30 µs per StructureDefinition in the collection. This should be sufficient for us to start by indexing the StructureDefinition on the fly before validation, with the option to explore caching later as necessary.

Evaluate constraint expressions

Additionally, we are currently not validating any FHIRPath constraints for resources; these are an important part of resource validation and should be implemented as soon as it's convenient. Constraints can be applied at the resource level, for specific resource fields, and also within some base data types like Quantity. In the ElementDefinition, constraints have a key, severity, human text description, and (generally) an expression containing FHIRPath. Only constraints with an expression value can be automatically validated, and all constraints with severity set to "error" must be enforced for the resource to be considered valid FHIR.

These constraint expressions often make full use of the available FHIRPath features to express relatively complex validation rules. We should ensure that our FHIRPath evaluation engine handles any edge cases, and also that we populate the expected environment variables from the FHIRPath and FHIR-specific FHIRPath specifications:

%ucum       // (string) url for UCUM (http://unitsofmeasure.org, per http://hl7.org/fhir/ucum.html)
%context    // The original node that was passed to the evaluation engine before starting evaluation

%resource       // the resource that contains the original node that is in %context
%rootResource   // the container resource for the resource identified by %resource

Project plan

A rough ordered breakdown of the project into PR-sized units is given below:

  1. Refactor the validation code to accept a profile StructureDefinition as input, so it can work with arbitrary profiles outside of the base specification
  2. Fix cardinality checks to support fields with cardinality like 0..2
  3. Improve validation for string-backed values by fencing regexes and adding more specific validation for datetime types
  4. Implement fixed value validation rules
  5. Implement slicing validation rules
  6. Enable specifying profiles to validate against in meta.profile
  7. Benchmark writes of resources with 1-3 profiles attached with ~50 ms target
  8. Allow customers to upload profiles into their project that can be used for resource validation
  9. Evaluate constraint definitions on resource types and their fields

Open questions

  • What should the server do if a resource specifies a profile the server doesn't understand?
    • Since the user controls the profiles available in their Project, maybe this should be an error
  • Customers may also want to apply profiles globally to all resources of a given type in their project; how should we accommodate that?
  • Where/how should we store common profiles like US Core so they can be re-used easily by all customers?
  • How can we ensure that customers' apps aren't broken by new validation rules or control the rollout in their Project?

Appendix A: Fixed values and slicing

Fixed values are specified in the ElementDefinition for a field via the fixed[x] and pattern[x] fields. These specify either an exact value that must be present with nothing extra allowed, or a minimum set of fields which must be present, respectively. For example, the Observation.code field in the US Core Blood Pressure profile is defined like this (with some properties omitted):

{
  "id" : "Observation.code",
  "path" : "Observation.code",
  "min" : 1,
  "max" : "1",
  "base" : {
    "path" : "Observation.code",
    "min" : 1,
    "max" : "1"
  },
  "type" : [{
    "code" : "CodeableConcept"
  }],
  "patternCodeableConcept" : {
    "coding" : [{
      "system" : "http://loinc.org",
      "code" : "85354-9"
    }]
  }
}

Slicing is a relatively complicated feature that bears detailed explanation. A field that can contain multiple values is called "sliced" if it has any slice definitions associated with it in the StructureDefinition. Each slice describes a distinct pattern that values of the field match against, where a value can only match against at most one slice. Each slice is paired with a required cardinality, and after all values from that field of the resource are matched (or not) into slices, the validation of these cardinalities determines whether the resource satisfies the slicing validation. For example, consider the Observation.component field, which has base cardinality 0..* and thus can contain multiple values. The US Core Blood Pressure profile makes two changes to this field: it alters the required cardinality for the field as a whole to 2..* and adds two slice definitions to the field. These slice definitions are called systolic and diastolic, and each defines the required data for a value representing these respective blood pressure measurements.

To validate slices, the general procedure is to:

  1. Read the field and slice definitions from the StructureDefinition to determine what the matching rules and cardinality for each slice are
  2. For each element in the sliced field on the resource, match it against each slice until it matches one: if it matches, increment a count for that slice
  3. After all values in the field have been matched (or did not match any slice), validate the recorded counts for each slice against the required cardinality for that slice

The information related to slicing in the StructureDefinition is spread over multiple ElementDefinition items in the JSON. For example, this is how the Observation.component field is defined in the US Core Blood Pressure profile (some properties omitted):

// ... previous field ElementDefinitions
{
  "id": "Observation.component",
  "path": "Observation.component",
  "slicing": {
	// `discriminator` defines which field(s) should be used for slice matching
    "discriminator": [
      {
        "type": "pattern",
        "path": "code"
      }
    ],
    "ordered": false, // Do slices need to be present in order?
    "rules": "open"
  },
  "min": 2,
  "max": "*",
  "base": {
    "path": "Observation.component",
    "min": 0,
    "max": "*"
  },
  "type": [
    {
      "code": "BackboneElement"
    }
  ]
},
// ... ElementDefinitions for all nested sub-fields of Observation.component, e.g. Observation.component.id
// ===== BEGIN SLICE DEFINITION =====
{
  "id": "Observation.component:systolic",
  "path": "Observation.component",
  "sliceName": "systolic",
  // Required cardinality for the slice: `1..1`
  "min": 1,
  "max": "1",
},
// ... ElementDefinitions for all nested sub-fields of Observation.component, including rules for matching `systolic`
// (most of these are just copies of the base ones above, except for fields that define matching rules, shown below)
{
  "id": "Observation.component:systolic.code",
  "path": "Observation.component.code",
  "min": 1,
  "max": "1",
  "type": [
    {
      "code": "CodeableConcept"
    }
  ],
  // This pattern in the `code` field is the discriminator value; anything that matches it
  // will be in the `systolic` slice
  "patternCodeableConcept": {
    "coding": [
      {
        "system": "http://loinc.org",
        "code": "8480-6"
      }
    ]
  }
},
{
  "id": "Observation.component:systolic.value[x]",
  "path": "Observation.component.value[x]",
  "min": 0,
  "max": "1",
  "type": [
    {
      "code": "Quantity" // Restricted from the base resource, which allows many different types
    }
  ]
},
{
  "id": "Observation.component:systolic.value[x].value",
  "path": "Observation.component.value[x].value",
  "min": 1, // This field is now required, though it wasn't in the base resource (see below)
  "max": "1",
  "base": {
    "path": "Quantity.value",
    "min": 0,
    "max": "1"
  },
  "type": [
    {
      "code": "decimal"
    }
  ]
},
{
  "id": "Observation.component:systolic.value[x].system",
  "path": "Observation.component.value[x].system",
  "min": 1,
  "max": "1",
  "base": {
    "path": "Quantity.system",
    "min": 0,
    "max": "1"
  },
  "type": [
    {
      "code": "uri"
    }
  ],
  // Fixed value, anything else in this field of a `component` value that matched into this slice is invalid
  "fixedUri": "http://unitsofmeasure.org" 
},
{
  "id": "Observation.component:systolic.value[x].code",
  "path": "Observation.component.value[x].code",
  "short": "(USCDI) Coded form of the unit",
  "min": 1,
  "max": "1",
  "base": {
    "path": "Quantity.code",
    "min": 0,
    "max": "1"
  },
  "type": [
    {
      "code": "code"
    }
  ],
  // Another fixed value, this time a `code`
  "fixedCode": "mm[Hg]",
},
// ===== END SLICE DEFINITION =====
// ... similar ElementDefinitions for `diastolic` slice
// ... remainder of ElementDefinitions for all nested fields in the resource type

Because the snapshot interleaves slicing ElementDefinition items in a flat list with normal ones, the structure can be a little tricky to parse. The FHIR specification guarantees that the slice definitions within the snapshot.element array of a constraining profile have an id of the form ResourceType.field.slicedField:sliceName.nestedSliceField; this can be used to easily index the ElementDefinition items correctly.

Consider this example (valid) US Core Blood Pressure Observation:

{
  "resourceType" : "Observation",
  "meta" : {
    "profile" : ["http://hl7.org/fhir/us/core/StructureDefinition/us-core-blood-pressure"]
  },
  "status" : "final",
  "category" : [{
    "coding" : [{
      "system" : "http://terminology.hl7.org/CodeSystem/observation-category",
      "code" : "vital-signs",
    }],
  }],
  "code" : {
    "coding" : [{
      "system" : "http://loinc.org",
      "code" : "85354-9",
    }],
    "text" : "Blood pressure systolic and diastolic"
  },
  "subject" : {
    "reference" : "Patient/1",
  },
  "effectiveDateTime" : "2023-05-15",
  "component" : [{
    "code" : {
      "coding" : [{
        "system" : "http://loinc.org",
        "code" : "8462-4",
        "display" : "Diastolic blood pressure"
      }],
    },
    "valueQuantity" : {
      "value" : 44,
      "unit" : "mmHg",
      "system" : "http://unitsofmeasure.org",
      "code" : "mm[Hg]"
    }
  },
  {
    "code" : {
      "coding" : [{
        "system" : "http://loinc.org",
        "code" : "8480-6",
        "display" : "Systolic blood pressure"
      }],
    },
    "valueQuantity" : {
      "value" : 109,
      "unit" : "mmHg",
      "system" : "http://unitsofmeasure.org",
      "code" : "mm[Hg]"
    }
  }]
}

First, note the use of fixed values in the code.coding field: these are set specifically by the profile and are the only valid values in that field. Slices are used in the category and component fields, with the validation for the component field proceeding as follows:

  1. The first value in the field is considered, and it does not match the pattern for the systolic slice (due to mismatch in code.coding.code), but does match to diastolic
  2. The second value is the field is considered, and it matches into the systolic slice
  3. Both slices have one element in them, and each has a required cardinality of 1..1, so the field slicing is valid NOTE: If the field had any extra values in it (with different code values), those would not match into any slice. However, since the slice cardinalities are and overall field cardinality of 2..* would still be satisfied, these extra values are considered valid — in contrast to fixed values.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment