ewoutkramer/A not so short overview of StructureDefinition.md

## A not so short overview of StructureDefinition.md

      
    Raw
  

              A not so short overview of StructureDefinition.md
            
          
    A not so short overview of StructureDefinition

While I was working on the new code generator for the .NET API 2.0, I had to review the class hierarchy of FHIR and how we represent this in the StructureDefinitions that are part of the specification. Certainly, if you are working with multiple versions of FHIR and do any kind of metadata work, you will have found yourself trying to remember the answer to questions like: "Is DataRequirements.codeFilter based on Element or BackboneElement?", "Was SimpleQuantity a datatype or a profile on Quantity in R3?" "How did we specify the datatype of Narrative.text in R4? Did that change across FHIR versions?"
Yet again, I found myself digging through tons of StructureDefinitions to find out the details I needed to get the code generation done. I told myself that this time around, I would actually document it, so you (and a future me) would have just a single page to go to.
Resources

Let's first take a look at the Resources. The Resource inheritance structure is pretty simple:

Even so, there are a few interesting things to notice:

Resource and DomainResource are abstract classes, you will never find an instance of these in your data.
DomainResource (from which most other resourced derive) introduces the text, contained, extension and modifierExtension properties - this means that the other three(!) remaining resources (Binary, Bundle and Parameter) cannot be extended, nor can they contain a human-readable summary. They also cannot contain contained resources!

If you have been looking at the R5-preview, you might have encountered two abstract DomainResource subclasses: CanonicalResource and its subclass MetadataResource. Togerhter, they specify a set of elements shared between the conformance resources (like StructureDefinition, ValueSet) and other "metadata" resources that have authoring information associated with them (e.g. PlanDefinition, TestScript). These resources are really more like interfaces, and are not really part of the inheritance-hierarchy of the resources. Currently, if you look in the StructureDefintion for, say, PlanDefinition, you'll find:
"type" : "PlanDefinition",
"baseDefinition" : 
     "http://hl7.org/fhir/StructureDefinition/MetadataResource",
"derivation" : "specialization",
Which would tell you that MetadataResource is the superclass for PlanDefinition. Which it really is not. By the time we publish R5, we will have to find a way to express this resource is really a specialization of DomainResource, but implements MetadataResource.
This will remain an R5 feature, so in R3 and R4, all the resources that moved under this new CanonicalResource are, in fact, subclassed directly from DomainResource.
Resource inheritance and substitutability

So, if there is an inheritance hierarchy, does that mean you can actually use a concrete type where there is an element that allows one of the supertypes? The answer is yes, in at least three places (there might be more):


In StructureDefinition, to define the type for the contained element in DomainResource:
"path" : "DomainResource.contained",
"short" : "Contained, inline Resources",
"min" : 0,
"max" : "*",
"type" : 
   [{
     "code" : "Resource"
   }],
This principle is also used in the Parameters.parameter.resource and the Bundle.entry.resource element. I am not aware of any others.


To specify that a resource reference can point to "Any" other resource. This happens often, and appears in the targetProfile element of a reference element in StructureDefinition, e.g.:
"path" : "MessageHeader.focus",
"short" : "The actual content of the message",
"type" : 
   [{
     "code" : "Reference",
     "targetProfile" : ["http://hl7.org/fhir/StructureDefinition/Resource"]
   }],


To define the SearchParameter common to all resources, as specified in the base element:
"name" : "_id",
"description" : "Logical id of this artifact",
"code" : "_id",
"base" : ["Resource"],
"expression" : "Resource.id"
In case you were wondering, there is a SearchParameter on DomainResource (and thus, any subclass) as well:
"name" : "_text",
"description" : "Search on the narrative of the resource",
"code" : "_text",
"base" : ["DomainResource"]


Attention: Resource.id

The id element has always been a source of confusion, maybe because we have two similar, but completely different, id elements: one on Resource and one on Element (and thus any datatype). We'll talk about that second one later, but let me first stress that Resource.id is a normal element, it behaves like Patient.active or any other innocent property. Its datatype is id, a normal, complex FHIR datatype, which means you may even extend it.
We have managed to make this worse by getting the datatype wrong in the StructureDefinition for Resource in R4:
<path value="Resource.id"/>
<type>
  <extension url="http://hl7.org/fhir/StructureDefinition/structuredefinition-fhir-type">
     <valueUrl value="string"/>
  </extension>
  <code value="http://hl7.org/fhirpath/System.String"/>
</type>
The extension is saying that this property is actually not a complex FHIR type, but a primitive string like we find in the value attribute in the XML representation for a code or string. That is wrong. It is currently not much better in R5:
<path value="Resource.id"/>
<type>
  <extension url="http://hl7.org/fhir/StructureDefinition/structuredefinition-fhir-type">
     <valueUrl value="id"/>
  </extension>
  <code value="http://hl7.org/fhirpath/System.String"/>
</type>
STU3 is (for now) the last version to get this right:
<path value="Resource.id"/>
<type>
   <code value="id"/>
</type>
Datatypes

The situation for the datatypes is more complex, and has seen more changes from FHIR version to version. In my mental picture of the datatypes (and elements using the datatypes) there are three broad categories, each with their own set of quirks:

The complex datatypes (Identifier, HumanName, etc.)
The primitive datatypes (string, code, etc.)
Backbones

All of these derive (indirectly or directly) from Element. The first two can be distinguished (in R3 and R4) only by looking at StructureDefinition.kind, which is complex-type or primitive-type respectively. In R5, complex datatypes and primitives do not directly derive from Element anymore, instead R5 introduces two new abstract classes, DataType and PrimitiveType. As you can guess, all primitives now derive from the latter, whereas complex types are children of DataType.
The Backbones are the most idiosyncratic of the types, so let's start with those.
Backbones

Backbones are the "anonymous" types defined in-place in the resource or datatype, and as such are not defined as independent, identifiable datatypes. Examples are Patient.contact (where contact is a set of elements that repeat, defined in place in Patient) and DataRequirement.codeFilter (idem, but then inside a datatype). Let us take a look:
 {
      "extension" : [{
        "url" : "http://hl7.org/fhir/StructureDefinition/structuredefinition-explicit-type-name",
        "valueString" : "Contact"
      }],
      "path" : "Patient.contact",      
      "type" : [{ "code" : "BackboneElement" }],
 }
 {
      "path" : "Patient.contact.relationship",
 }
As you can see, this backbone (here Patient.contact) is declared to be of type BackboneElement. Subsequent children elements (I've just shown Patient.contact.relationship here) define the child members of the element (and thus the type). Note, again, that this is done inside the Patient resource, has no canonical url of its own and thus cannot be re-used by other resources. Also, since R3, there is an extension to specify a name for the backbone type. This name is not unique, and is mostly used for rendering purposes (e.g. in UML diagrams, this backbone would still be represented as a class with a name) and code generation (in most programming languages this nested class would need to be represented as a first-class named class type). Unfortunately, you cannot assume all backbones have this extension specified, and you need to have a fallback scenario to derive your own (e.g. using the last part of the path) if necessary.
The type BackboneElement itself is a direct subclass of Element, and only adds the modifierExtension element. It is important to remember that all backbones in resources are using BackboneElement. Backbones also exist in datatypes, but they are relatively rare (examples are Dosage.doseAndRate, Timing.repeat and ElementDefinition.slicing). Backbones in datatypes are of type Element, rather than BackboneElement, to enforce the general FHIR rule that datatypes (including their elements) cannot have a modifier extension.
By the way, there is subtle room for error in your programming logic here. We noted before that the use of the abstract type Resource in DomainResource.contained meant you could substitute any resource at that point. The indistinguishable use of the abstract BackboneElement as a type of a backbone element does evidently not imply the same kind of polymorphism.
Other elements (and currently, other elements appearing lateR in the StructureDefinition) may re-use this backbone element to specify their type, by using a contentReference element in the ElementDefinition (This is from Questionnaire.item.item):
  <contentReference value="#Questionnaire.item"/> 

Note the '#' here. It means "inside the base definition for this StructureDefinition". This means that a contentReference always refers to the element in the base definition, not to the element in the StructureDefinition you are currently processing!
Complex and primitive datatypes

The difference between complex and primitive datatypes (and even resources) is pretty slim in FHIR, mostly because FHIR's primitive are not really primitive: they still can be extended and can be identified for reference. What does set them apart is the presence of a value element (represented as an attribute in XML) that is typed as a primitive value (e.g. a simple string or integer). This is confusing and amongst my colleagues we have introduced terminology like "real primitive" and "FHIR primitive" to distinguish the two.
That being said, it also means that you can treat the StructureDefinitions for both almost identically, and you just have to be aware of the value attribute (more about that later).
There are a few things pointing out. One is the use of extensions: as discussed before, FHIR allows extensions anywhere, but modifier extensions only on Resources, not on datatypes. By adding the modifierExtension to DomainResource and BackboneElement, but not to Element, this was structurally enforced in R3. However, in the R3 timeframe (and before), we had promoted some commonly used backbones to datatypes, among them ElementDefinition and Dosage. As part of resources, these structures had allowed modifier extensions, however, once derived from Element (because of their "promotion" to a datatype), this was no longer allowed. In R4, we corrected this mistake by formally identifying a subset of datatypes that could have modifier extensions on them. The way we did that was by making these types derive from BackboneElement. This seemed to make sense: as part of resources they used to be backbone elements, now they were simply stand-alone datatypes, still deriving from BackboneElement. It did confound the notion of a "true" backbone element and a datatype, however, and in R5, the datatype hierarchy got enriched by a true abstract BackboneType class, which is the parent class of all datatypes that allow modifier extensions. The full hierarchy (with its history) is summarized below:

The set of datatypes allowing modifiers (under BackboneElement in R4, and BackboneType in R5) is limited. Since R4 we have Dosage, ElementDefinition and Timing. R5 will (probably) add

MarketingStatus, OrderedDistribution, Population, ProdCharacteristic, ProductShelfLife and Statistic.
Primitive's value (and friends)

Previously, I told you that the only difference in treating StructureDefinition for complex types and primitive types was the value element. Well, that was mostly true. There are three other elements, part of complex datatypes, that behave just like the value attribute. These are:

Extension.url - a simple string that contains an uri
Narrative.div - a simple string that contains xhtml
Element.id - a simple string containing an id of the element

Now, I purposely use the word "simple string" here, since these three elements are not of a FHIR type, they cannot be extended, they are simple values. Unfortunately, we made quite a mess of how we specified this in the FHIR core StructureDefinitions. Before STU3, we actually typed these elements as FHIR types (there wasn't anything else we could do). The ElementDefinition.type.code where the type for an element lives is a required binding, so we just pretended our world was turtles all the way down. Primtive's value attribute thus was declared to be of another FHIR type. The serialization for XML and Json would not allow you to do non-primitive things with them anyway. This heritage is still visible, if you look at the definition of an Extension on the HL7 website, you will see this:

Note how the type for url is still indicated as uri, it will even forward you to the FHIR "primitives" page. But in fact, it is a simple string.
By the time we were working on R3, we had been convinced this was the wrong approach. Unfortunately, we chose the wrong solution for R3: we chose to not supply a type, and used extensions to tell you so the kind of "real primitive" value you were dealing with:
<!-- R3 situation -->
<element id="boolean.value">
 <path value="boolean.value"/>
 <representation value="xmlAttr"/>
 <!-- Note: primitive values do not have an assigned type. e.g. this is compiler magic. XML, JSON and RDF types provided by extension -->
 <type>
   <code>
     <extension url="http://hl7.org/fhir/StructureDefinition/structuredefinition-json-type">
         <valueString value="boolean"/>
     </extension>
     <extension url="http://hl7.org/fhir/StructureDefinition/structuredefinition-xml-type">
         <valueString value="xsd:boolean"/>
     </extension>
     <extension url="http://hl7.org/fhir/StructureDefinition/structuredefinition-rdf-type">
         <valueString value="xsd:boolean"/>
     </extension>
   </code>
 </type>
</element>
We even generated a comment into the StructureDefinition: compiler magic here! As you can see, the value of a FHIR primitive (boolean in this case) does not have a type anymore, the code element is empty, except for a few extensions. These give the on-the-wire format of the primitive value for each of the known serializations.
We then failed to apply this solution consistently for value's friends Extension.url, Element.id and Narrative.div however:
 <!-- R3 situation -->
 <element id="Extension.url">
   <path value="Extension.url"/>
   <representation value="xmlAttr"/>
   <type>
      <code value="uri"/>
   </type>
</element>
and
<!-- R3 situation -->
<element id="Element.id">
  <path value="Element.id"/>
  <representation value="xmlAttr"/>
  <type>
    <code value="string"/>
  </type>
</element>
and
<!-- R3 situation -->
<element id="Narrative.div">
  <path value="Narrative.div"/>
  <type>
     <code value="xhtml"/>
  </type>
</element>
remained as they were in DSTU2. Note how representation is set to xmlAttr for all of these, except for Narrative.div. This makes sense, since we are dealing with XHtml for the latter, for which we have the xhtml representation. This is however, not applied here at the element level, but instead at the value attribute for the xhtml type.
Then, for R4, the shared maintenance of the FhirPath standard with the CQL people (like Bryn and Chris) forced us to unify our type system with that of CQL. It did have the benefit of providing us, finally, with the correct solution. We loosened the binding strength for ElementDefinition.type.code to extensible so we could use non-FHIR codes for the types, and hence introduce a (url based) scheme to name the "real primitives", which we by then called "system primitives":
<!-- R4/R5 situation -->
<element id="boolean.value">
  <path value="boolean.value"/>
  <representation value="xmlAttr"/>
  <type>
    <extension url="http://hl7.org/fhir/StructureDefinition/structuredefinition-fhir-type">
      <valueUrl value="boolean"/>
    </extension>
    <code value="http://hl7.org/fhirpath/System.Boolean"/>
  </type>
</element>
As you can see, the code element now has a value again, and actually specified the "external" type for this primitive value element. We refer here to the set of types that had been defined and used already by CQL. You can take a look at Appendix B of the CQL specification to find their exact definition. These types are now used as the basis for FHIR, CQL, FhirPath and the FHIR Mapping Language.
When using R3, you might have the need to map the structure-definition-xml-type extension to these system types as follows:


R3 xml type
R4+ system type


xsd:boolean
System.Boolean


xsd:int
System.Integer


xsd:string
System.String


xsd:decimal
System.Decimal


xsd:anyURI
System.String


xsd:base64Binary
System.String


xsd:dateTime
System.DateTime


xsd:gYear OR xsd:gYearMonth OR xsd:date
System.DateTime


xsd:gYear OR xsd: gYearMonth OR xsd: date OR xsd: dateTime
System.DateTime


xsd:time
System.Time


xsd:token
System.String


xsd:nonNegativeInteger
System.Integer


xsd:positiveInteger
System.Integer


xhtml:div
System.String


The same approach is used for Element.id and Extension.url. Note though that Narrative.div remains defined in terms of primitive FHIR type xhtml, which in its turn does use this approach to define xhtml.value:
<element id="xhtml.value">
  <path value="xhtml.value"/>
  <representation value="xhtml"/>
  <type>
     <extension url="http://hl7.org/fhir/StructureDefinition/structuredefinition-fhir-type">
       <valueUrl value="string"/>
     </extension>
     <code value="http://hl7.org/fhirpath/System.String"/>
  </type>
</element>
This still feels inconsistent to me (is Narrative.text really a complex FHIR value of type xhtml or a true primitive, just like Extension.url?), and I personally think we should "pull up" the specification of xhtml.value into Narrative.div and get rid of xhtml. A consequence of treating xhtml as a datatype, is that is inherits from Element, and thus has an extension element. In xhtml the max occurrence for this field has been set to 0 in both the snapshot and differential. For those generating classes in OO languages from this spec this obviously presents problems, since there's no way to "remove" this extension element from the class hierarchy.
Datatype inheritance and substitutability

As stated before, primitives (in R5) are now all children of PrimitiveType. The hierarchy is a bit deeper though, as shown below (taken from the current spec):

This hierarchy has not changed much (yet) between R4 and R5, the major addition being the integer64 type (which is in the process of being renamed to long).
This picture clearly shows that not all primitives are direct children of PrimitiveType (in R5) or Element (in R3 and R4): there is a set of "stringy" types derived from string (i.e. markdown), a few constrained uri's deriving from uri (like uuid), and additional kinds of integer called positiveInt and unsignedInt. Contrary to class inheritance in common programming languages, these subclasses are actually more specialized versions of their superclasses and introduce no additional members of functionality. Instead they constrain the set of values that the type can represent.
Which leads me to a very common error in dealing with the datamodel: these specialized subclasses cannot be substituted for their (abstract) supertypes! For example, if an element is of type string, you cannot use values of type markdown for that element. Not that this is easy to do so: in most places there is no type information in the XML/Json serialization for FHIR. This may, however, happen with choice properties. For example, Observation.value[x] specifies that it may contain a string - but that does not mean you can supply a code or markdown here: valueMarkdown would be incorrect.
This is in contrast to what we saw in the last section, where the inheritance hierarchy for Resources does permit substitutability. Even so, derived resources and specialized datatypes both have a StructureDefinition.derivation of kind specialization - you will just have to hard-code this knowledge into your software.
Note that the current R5 publication shows xhtml as a direct child of DataType. This is a mistake: xhtml is a primitive (its StructureDefinition.kind is primitive-type) and its base is in fact PrimitiveType.
Constraints on Quantity

That is not all there is to say about the hierarchy. The "subclasses" under Quantity merit some attention too. Some of these are subclasses (like Money and Distance), others are actually constraints (e.g. MoneyQuantity). This difference between the two is quite substantial: the "constraint" types under Quantity are not types at all. They cannot appear as names in e.g. Observation.valueMoneyQuantity, and when they are referenced, they are referenced like a profile (which they really are). Take a look at how SimpleQuantity (a constraint in R4) is referenced by CarePlan in its StructureDefinition:
 {
      "path" : "CarePlan.activity.detail.dailyAmount",
      "type" : [{
        "code" : "Quantity",
        "profile" : ["http://hl7.org/fhir/StructureDefinition/SimpleQuantity"]
      }],
 }
This clearly shows that the type of this element is Quantity, but it is further constrained to the core profile SimpleQuantity.
Since R3, SimpleQuantity is a constraint, in R4 (and R5) MoneyQuantity was added as a constraint. To be more precise, the existing Money type (subclass of Quantity) was turned into a constraint (MoneyQuantity) and a new Money type was introduced. The latter represents a currency, and has a currency element, whereas MoneyQuantity profile is based on Quantity and just restricts the Quantity.code to be a currency code.
A family tree with Resource and Element

For years, the resources and datatypes lived in separate inheritance hierarchies. Some of the reference implementations already fixed that gap (.NET has a class called Base and Java IBase), and R5 will formally introduce the abstract resource/datatype Base. It has no elements, and two subclasses, Resource and Element. This means Resource points to it as its baseDefinition:
"kind" : "resource",
"abstract" : true,
"type" : "Resource",
"baseDefinition" : 
   "http://hl7.org/fhir/StructureDefinition/Base",
"derivation" : "specialization",
and so does Element:
"kind" : "complex-type",
"abstract" : true,
"type" : "Element",
"baseDefinition" : 
   "http://hl7.org/fhir/StructureDefinition/Base",
"derivation" : "specialization",
Before, these two types had no base nor a derivation.
Which makes you wonder, what kind is Base? In its StructureDefinition, we find:
"kind" : "complex-type",
"abstract" : true,
"type" : "Base",
So, formally, Base is a complex type, and Resources are special kinds of complex types. Notice that there is no derivation, and no base.
Finally

Phew. As you have seen, we have tried our best to keep you busy when you are working with StructureDefinition in all its glory. And we have introduced enough changes over the past years to make it even more enjoyable when working with multiple versions. I've tried to be complete, but it's unlikely I succeeded. So, if you find any omissions, let me know!
R3 xml type	R4+ system type
xsd:boolean	System.Boolean
xsd:int	System.Integer
xsd:string	System.String
xsd:decimal	System.Decimal
xsd:anyURI	System.String
xsd:base64Binary	System.String
xsd:dateTime	System.DateTime
xsd:gYear OR xsd:gYearMonth OR xsd:date	System.DateTime
xsd:gYear OR xsd: gYearMonth OR xsd: date OR xsd: dateTime	System.DateTime
xsd:time	System.Time
xsd:token	System.String
xsd:nonNegativeInteger	System.Integer
xsd:positiveInteger	System.Integer
xhtml:div	System.String