- Overview
- Goals
- Background: FHIR profiles
- Current state
- Proposed design
- Project plan
- Open questions
- Appendix A: Fixed values and slicing
Profiles allow extensive customization of the FHIR resource validation process to ensure that different resources contain the data necessary to satisfy various use cases. They're defined using the same StructureDefinition
resource as the base resource types, but profiles often use features/fields not found in the base types. Adding robust profile validation functionality to the FHIR server will enable customers to use existing standardized profiles to conform to industry best practices, as well as to create their own profiles to ensure that data written meets their application's required schema.
Key customer use cases driving this project include:
- Adding US Core profiles to check a compliance box
- Using custom profiles to enforce their own application schema
- Building a library of integration-focused profiles for common vendors
- Support all profile functionality used in well-known profiles, including US Core and Lab Services IG
- Allow customers to create and maintain their own profiles and use them for resource validation in their Project
- Provide a painless upgrade path for customers in both strict and non-strict validation modes
- Keep resource validation performant and efficient even when 1-3 profiles are in use
The use of profiles to constrain resources is a cornerstone of the FHIR ecosystem, and has several key features that work together to give profile authors powerful capabilities for defining the required "shape" of a resource for specific use cases. Profiles affect the validation rules for a resource in a variety of ways, with the critical caveat that "the constraining profile can only allow what the base profile allows". A profile can only make validation rules more restrictive. The most common ways that profiles specialize resources are:
- Restricting the cardinality of fields, e.g. to make a field mandatory or forbidden
- Requiring the use of a specific terminology in a field, e.g. to mandate the use of LOINC codes in a
CodeableConcept
- Fixing the expected value of a field, e.g. to require that a quantity uses certain units
- Use slices to assert that a field containing multiple values must have some values satisfying certain requirements, e.g. to require that a measurement contain both systolic and diastolic blood pressure
Within the StructureDefinition
, the profile is defined by one or both of two top-level fields: differential
and snapshot
. Relative to the base profile being specialized (from the baseDefinition
field), the differential
field contains only the changes in the current profile from its base profile. The snapshot
field contains the full flattened resource type definition, with all base profiles included. In general, it is most useful to work with profiles that include a correct snapshot
, since the resource can be validated in one pass without needing to resolve the base profile(s).
Both of these fields contain a list of ElementDefinition
items corresponding to the fields of the described resource type. The fields are listed in depth-first traversal order, so a parent field is always listed before its children, and all descendants of a field are listed before its next sibling field. As an example, these are the path
values from the first 25 items in the Patient
definition snapshot
in order:
Patient
Patient.id
Patient.meta
Patient.implicitRules
Patient.language
Patient.text
Patient.contained
Patient.extension
Patient.modifierExtension
Patient.identifier
Patient.active
Patient.name
Patient.telecom
Patient.gender
Patient.birthDate
Patient.deceased[x]
Patient.address
Patient.maritalStatus
Patient.multipleBirth[x]
Patient.photo
Patient.contact
Patient.contact.id
Patient.contact.extension
Patient.contact.modifierExtension
Patient.contact.relationship
A resource can claim that it conforms to a profile by adding the canonical URL of the profile to its meta.profile
field, e.g.
{
"resourceType": "Patient",
"meta": {
"profile": ["http://hl7.org/fhir/us/core/StructureDefinition/us-core-patient"]
}
}
Note that this is only a claim: the resource must still be validated to ensure it actually satisfies the profile. A resource may also satisfy a profile without claiming so in meta.profile
.
Validation logic currently resides in the @medplum/core
package, and can already use a StructureDefinition
loaded from the base specification to validate the properties of a JSON resource. We use this functionality in the Repository
from @medplum/server
to validate resources before they are written. Today, this code performs the following validations:
- Check that the resource contains a
resourceType
property (not actually a resource field, but required in FHIR JSON) - Ensure that
null
does not appear anywhere in the JSON- NOTE: This behavior is incorrect: valid primitive extensions may contain
null
values:[W]hen one of the repeating elements has no value, it is represented in the first array using a null. When an element has a value but no extension/id, the second array will have a null at the position of that element.
- NOTE: This behavior is incorrect: valid primitive extensions may contain
- Check each field's cardinality, including required fields (min > 0) and array fields (\max > 1)
- NOTE: This currently only checks for the above conditions, and does not validate the actual cardinality requirement
- Check that the data types of fields are correct
- Ensures that no extraneous fields are present on the resource
Some of these validation checks currently make improper assumptions based on what appears in the base specification. Profiles use a wider set of features from the StructureDefinition
, and do not in general follow the same patterns as the base specification. Based on the survey of current validation functionality above, the following gaps exist in validation as implemented at present:
- Field cardinality should be strictly checked; it currently only checks if a field is required or an array
- Fields with fixed multiple cardinality are not well-supported, e.g. a field with cardinality
0..2
will not be handled correctly - If a profile restricts the cardinality of a field from e.g.
0..*
to1..1
, the field must still be formatted in JSON as an array. The code should look at not just themin
andmax
values in theElementDefinition
, but also take into accountbase.min
andbase.max
- The regex hints for FHIR data types represented as JSON strings are used directly, but these are explicitly only informational, and are insufficient to fully validate more complex types like
dateTime
oruuid
- Any regex used must also be fenced with
^
and$
to explicitly ensure the whole input string matches the expected regex. Otherwise, as it functions today, a "date" like"2023-05-15 lol jk this isn't a date"
will be accepted and written to the database
- Any regex used must also be fenced with
When a resource is written to the FHIR server, the server should inspect its meta.profile
field to identify any profiles the resource claims to satisfy. For each URL in the field, the server should:
- Retrieve the referenced profile, matching against the
StructureDefinition:url
search parameter - Validate the resource against each retrieved profile
- If successful, write the resource; otherwise, return a 400 error to the client
The steps to realize this concept are detailed below.
First, we should take a pass over the existing validation code and close the gaps listed above, since many of these gaps will be directly exposed by common profiles. We should also perform any necessary targeted refactoring up front to ensure the code is structured correctly for profile validation: for example, the validation logic should accept a passed-in StructureDefinition
instead of requiring it to be loaded into the global schema cache.
We will need to implement two key features of profile validation that are currently missing from our validation logic: fixed values and slicing. These are used pervasively in common profiles to specialize resource types, and require reading new fields from the StructureDefinition
resource. Fixed values are straightforward, and specify that a field must contain exactly some value. Slicing, on the other hand, is more flexible and complex: it enables specifying patterns that some but not necessarily all elements in an array field must satisfy. This feature is discussed in more detail in Appendix A. One these features are implemented, we will have covered approximately the most common 80% of profile validation functionality. At this point, we would be able to validate specific profiles the server had loaded into memory — this would include the base specification but could also include support for common profiles like US Core.
Customers should also be able to use their own profiles, so the system will need to be able to load arbitrary StructureDefinition
resources when performing validation. As mentioned above, these are referenced by their canonical URL: the server should search for StructureDefinition
resources where the url
field matches the URLs present in the resource's meta.profile
field. The pseudocode for the validation routine of a given resource could be something like this:
profileURLs = resource.meta.profiles
profiles = await repo.search({
resourceType: 'StructureDefinition',
filters: [{ code: 'url', operator: Operator.EQUALS, value: profileURLs.join(',') }]
})
for profile in profiles:
validateResource(resource, profile) // throws if invalid
writeResource(resource)
NOTE: Validating multiple profiles via their snapshot
field requires duplicative work, since the rules from the base definition are applied again for each profile. In theory one could validate the base resource profile and then only check the differential
fields of profiles that extend it, but this would also require storing and retrieving the full chain of linked StructureDefinitions
from the target profile back to the base.
Using the raw StructureDefinition
resources for validation may be prohibitively expensive due to complexity in reading from the linearized field list in snapshot
. Instead, we can compile and possibly store an indexed form of the profile definition that makes it easier to retrieve the values we need for validation. This indexing appears to be reasonably efficient already: indexing the entire profiles-resources.json
Bundle
from the FHIR spec takes ~3 ms, yielding an average of under 30 µs per StructureDefinition
in the collection. This should be sufficient for us to start by indexing the StructureDefinition
on the fly before validation, with the option to explore caching later as necessary.
Additionally, we are currently not validating any FHIRPath constraints for resources; these are an important part of resource validation and should be implemented as soon as it's convenient. Constraints can be applied at the resource level, for specific resource fields, and also within some base data types like Quantity
. In the ElementDefinition
, constraints have a key
, severity
, human
text description, and (generally) an expression
containing FHIRPath. Only constraints with an expression
value can be automatically validated, and all constraints with severity
set to "error" must be enforced for the resource to be considered valid FHIR.
These constraint expressions often make full use of the available FHIRPath features to express relatively complex validation rules. We should ensure that our FHIRPath evaluation engine handles any edge cases, and also that we populate the expected environment variables from the FHIRPath and FHIR-specific FHIRPath specifications:
%ucum // (string) url for UCUM (http://unitsofmeasure.org, per http://hl7.org/fhir/ucum.html)
%context // The original node that was passed to the evaluation engine before starting evaluation
%resource // the resource that contains the original node that is in %context
%rootResource // the container resource for the resource identified by %resource
A rough ordered breakdown of the project into PR-sized units is given below:
- Refactor the validation code to accept a profile
StructureDefinition
as input, so it can work with arbitrary profiles outside of the base specification - Fix cardinality checks to support fields with cardinality like
0..2
- Improve validation for string-backed values by fencing regexes and adding more specific validation for datetime types
- Implement fixed value validation rules
- Implement slicing validation rules
- Enable specifying profiles to validate against in
meta.profile
- Benchmark writes of resources with 1-3 profiles attached with ~50 ms target
- Allow customers to upload profiles into their project that can be used for resource validation
- Evaluate constraint definitions on resource types and their fields
- What should the server do if a resource specifies a profile the server doesn't understand?
- Since the user controls the profiles available in their Project, maybe this should be an error
- Customers may also want to apply profiles globally to all resources of a given type in their project; how should we accommodate that?
- Where/how should we store common profiles like US Core so they can be re-used easily by all customers?
- How can we ensure that customers' apps aren't broken by new validation rules or control the rollout in their Project?
Fixed values are specified in the ElementDefinition
for a field via the fixed[x]
and pattern[x]
fields. These specify either an exact value that must be present with nothing extra allowed, or a minimum set of fields which must be present, respectively. For example, the Observation.code
field in the US Core Blood Pressure profile is defined like this (with some properties omitted):
{
"id" : "Observation.code",
"path" : "Observation.code",
"min" : 1,
"max" : "1",
"base" : {
"path" : "Observation.code",
"min" : 1,
"max" : "1"
},
"type" : [{
"code" : "CodeableConcept"
}],
"patternCodeableConcept" : {
"coding" : [{
"system" : "http://loinc.org",
"code" : "85354-9"
}]
}
}
Slicing is a relatively complicated feature that bears detailed explanation. A field that can contain multiple values is called "sliced" if it has any slice definitions associated with it in the StructureDefinition
. Each slice describes a distinct pattern that values of the field match against, where a value can only match against at most one slice. Each slice is paired with a required cardinality, and after all values from that field of the resource are matched (or not) into slices, the validation of these cardinalities determines whether the resource satisfies the slicing validation. For example, consider the Observation.component
field, which has base cardinality 0..*
and thus can contain multiple values. The US Core Blood Pressure profile makes two changes to this field: it alters the required cardinality for the field as a whole to 2..*
and adds two slice definitions to the field. These slice definitions are called systolic
and diastolic
, and each defines the required data for a value representing these respective blood pressure measurements.
To validate slices, the general procedure is to:
- Read the field and slice definitions from the
StructureDefinition
to determine what the matching rules and cardinality for each slice are - For each element in the sliced field on the resource, match it against each slice until it matches one: if it matches, increment a count for that slice
- After all values in the field have been matched (or did not match any slice), validate the recorded counts for each slice against the required cardinality for that slice
The information related to slicing in the StructureDefinition
is spread over multiple ElementDefinition
items in the JSON. For example, this is how the Observation.component
field is defined in the US Core Blood Pressure profile (some properties omitted):
// ... previous field ElementDefinitions
{
"id": "Observation.component",
"path": "Observation.component",
"slicing": {
// `discriminator` defines which field(s) should be used for slice matching
"discriminator": [
{
"type": "pattern",
"path": "code"
}
],
"ordered": false, // Do slices need to be present in order?
"rules": "open"
},
"min": 2,
"max": "*",
"base": {
"path": "Observation.component",
"min": 0,
"max": "*"
},
"type": [
{
"code": "BackboneElement"
}
]
},
// ... ElementDefinitions for all nested sub-fields of Observation.component, e.g. Observation.component.id
// ===== BEGIN SLICE DEFINITION =====
{
"id": "Observation.component:systolic",
"path": "Observation.component",
"sliceName": "systolic",
// Required cardinality for the slice: `1..1`
"min": 1,
"max": "1",
},
// ... ElementDefinitions for all nested sub-fields of Observation.component, including rules for matching `systolic`
// (most of these are just copies of the base ones above, except for fields that define matching rules, shown below)
{
"id": "Observation.component:systolic.code",
"path": "Observation.component.code",
"min": 1,
"max": "1",
"type": [
{
"code": "CodeableConcept"
}
],
// This pattern in the `code` field is the discriminator value; anything that matches it
// will be in the `systolic` slice
"patternCodeableConcept": {
"coding": [
{
"system": "http://loinc.org",
"code": "8480-6"
}
]
}
},
{
"id": "Observation.component:systolic.value[x]",
"path": "Observation.component.value[x]",
"min": 0,
"max": "1",
"type": [
{
"code": "Quantity" // Restricted from the base resource, which allows many different types
}
]
},
{
"id": "Observation.component:systolic.value[x].value",
"path": "Observation.component.value[x].value",
"min": 1, // This field is now required, though it wasn't in the base resource (see below)
"max": "1",
"base": {
"path": "Quantity.value",
"min": 0,
"max": "1"
},
"type": [
{
"code": "decimal"
}
]
},
{
"id": "Observation.component:systolic.value[x].system",
"path": "Observation.component.value[x].system",
"min": 1,
"max": "1",
"base": {
"path": "Quantity.system",
"min": 0,
"max": "1"
},
"type": [
{
"code": "uri"
}
],
// Fixed value, anything else in this field of a `component` value that matched into this slice is invalid
"fixedUri": "http://unitsofmeasure.org"
},
{
"id": "Observation.component:systolic.value[x].code",
"path": "Observation.component.value[x].code",
"short": "(USCDI) Coded form of the unit",
"min": 1,
"max": "1",
"base": {
"path": "Quantity.code",
"min": 0,
"max": "1"
},
"type": [
{
"code": "code"
}
],
// Another fixed value, this time a `code`
"fixedCode": "mm[Hg]",
},
// ===== END SLICE DEFINITION =====
// ... similar ElementDefinitions for `diastolic` slice
// ... remainder of ElementDefinitions for all nested fields in the resource type
Because the snapshot
interleaves slicing ElementDefinition
items in a flat list with normal ones, the structure can be a little tricky to parse. The FHIR specification guarantees that the slice definitions within the snapshot.element
array of a constraining profile have an id
of the form ResourceType.field.slicedField:sliceName.nestedSliceField
; this can be used to easily index the ElementDefinition
items correctly.
Consider this example (valid) US Core Blood Pressure Observation
:
{
"resourceType" : "Observation",
"meta" : {
"profile" : ["http://hl7.org/fhir/us/core/StructureDefinition/us-core-blood-pressure"]
},
"status" : "final",
"category" : [{
"coding" : [{
"system" : "http://terminology.hl7.org/CodeSystem/observation-category",
"code" : "vital-signs",
}],
}],
"code" : {
"coding" : [{
"system" : "http://loinc.org",
"code" : "85354-9",
}],
"text" : "Blood pressure systolic and diastolic"
},
"subject" : {
"reference" : "Patient/1",
},
"effectiveDateTime" : "2023-05-15",
"component" : [{
"code" : {
"coding" : [{
"system" : "http://loinc.org",
"code" : "8462-4",
"display" : "Diastolic blood pressure"
}],
},
"valueQuantity" : {
"value" : 44,
"unit" : "mmHg",
"system" : "http://unitsofmeasure.org",
"code" : "mm[Hg]"
}
},
{
"code" : {
"coding" : [{
"system" : "http://loinc.org",
"code" : "8480-6",
"display" : "Systolic blood pressure"
}],
},
"valueQuantity" : {
"value" : 109,
"unit" : "mmHg",
"system" : "http://unitsofmeasure.org",
"code" : "mm[Hg]"
}
}]
}
First, note the use of fixed values in the code.coding
field: these are set specifically by the profile and are the only valid values in that field. Slices are used in the category
and component
fields, with the validation for the component
field proceeding as follows:
- The first value in the field is considered, and it does not match the pattern for the
systolic
slice (due to mismatch incode.coding.code
), but does match todiastolic
- The second value is the field is considered, and it matches into the
systolic
slice - Both slices have one element in them, and each has a required cardinality of
1..1
, so the field slicing is valid NOTE: If the field had any extra values in it (with differentcode
values), those would not match into any slice. However, since the slice cardinalities are and overall field cardinality of2..*
would still be satisfied, these extra values are considered valid — in contrast to fixed values.