Skip to content

Instantly share code, notes, and snippets.

@mattwiller
Last active May 10, 2023 21:41
Show Gist options
  • Save mattwiller/6167035b0e5e3e53fad1eb2141b876df to your computer and use it in GitHub Desktop.
Save mattwiller/6167035b0e5e3e53fad1eb2141b876df to your computer and use it in GitHub Desktop.
FHIR Date Search + Postgres

FHIR date Search with Postgres

Goals

  • Support filtering and sorting functionality on all FHIR datetime data types
  • Perform datetime search efficiently, ideally in a single query without requiring significant server code

Current state

  • Search parameters with date type stored in Postgres as either date or timestamp with time zone, depending on underlying field type
    • Not ideal: some search parameters could operate other either type
  • No support for some datetime types, e.g. Period
  • Search filters are constructed naively, e.g.
    -- GET /Observation?date=gt2023-03
    SELECT * FROM "Observation"
    WHERE "date" > '2023-03'
    • Currently causes a syntax error when searching for imprecise datetimes as shown above, since '2023-03' isn't a valid Postgres datetime

Proposed solution

To account for all datetime types in search, we could store the search parameter value explicitly in Postgres as a range. This aligns with how the specification describes date search parameters:

Data type Description
date The range of the value is the day, month, or year as specified
dateTime Similar to date
instant An instant is considered a fixed point in time with an interval smaller than the precision of the system, i.e. an interval with an effective width of 0
Period Explicit, though the upper or lower bound might not actually be specified in resources
Timing specified scheduling details are ignored and only the outer limits matter

Note that all five data types are ultimately represented as a range, even instant. Aligning the values stored in Postgres with this representation will enable us to perform search queries directly against the database.

Postgres offers built-in range types over timestamp values that should allow us to easily store the search value in the correct format for querying. Specifically, the tstzrange type should give us the flexibility we need to specify and query all kinds of date ranges, plus many helpful functions for working with ranges. In order to make the most out of the column and achieve best performance when searching, we should consider adding a GIST index on important columns.

To illustrate how this column type would work, consider the following Observation resource:

{
  "resourceType": "Observation",
  "id": "b322be3a-2b05-4f81-8eed-75f121c9ae52",
  "meta": {
	  "lastUpdated": "2023-05-08T12:34:56Z",
  },
  "status": "final",
  "code": {
	  "coding": [{
		  "system": "http://loinc.org",
		  "code": "49222-3",
		  "display": "Peanut allergy test"
	  }]
  },
  "effectiveDateTime": "2023-05-08",
}

Note that the Observation:date search parameter is expected to search against the partially-specified effectiveDateTime field.

Indexing

Creating the resource shown above should result in a Postgres write like:

-- PUT /Observation/b322be3a-2b05-4f81-8eed-75f121c9ae52
INSERT INTO "Observation" (id, content, "lastUpdated", compartments, "date") VALUES
(
	'b322be3a-2b05-4f81-8eed-75f121c9ae52',
	'{"resourceType":"Observation","id":"b322[...]}',
	'2023-05-08T12:34:56Z',
	'{}',
	'[2023-05-08, 2023-05-09)'::tstzrange
);

Postgres is able to automatically convert some partially specified datetimes into the full form: yyyy-mm-dd and yyyy-mm-ddThh:mmZ are accepted in addition to the full yyyy-mm-ddThh:mm:ssZ (note that for date-only endpoints, UTC time zone is assumed). Omitted parts are assumed to be zero.

Examples of the translation of FHIR datetime data types into equivalent Postgres tstzrange values are given below:

FHIR Postgres
'2023' (date) '[2023-01-01, 2024-01-01)'::tstzrange
'2023-02' (date) '[2023-02-01, 2023-03-01)'::tstzrange
'2023-03-31' (dateTime) '[2023-03-31, 2023-04-01)'::tstzrange
'2023-04-01T12:34Z' (dateTime) '[2023-04-01T12:34:00Z, 2023-04-01T12:34:35)'::tstzrange
'2023-04-01T12:34:56Z' (dateTime) '[2023-04-01T12:34:56Z, 2023-04-01T12:34:57Z)'::tstzrange
'2023-04-01T12:34:56Z' (instant) '[2023-04-01T12:34:56Z, 2023-04-01T12:34:56Z]'::tstzrange
{start: '2023-01', end: '2023-05-01'} (Period) '[2023-01-01, 2023-05-02)'::tstzrange
{start: '2023-05-01T12:34:56Z', end: '2023-05-01T12:35:00Z'} (Period) '[2023-05-01T12:34:56Z, 2023-05-01T12:34:35:01Z)'::tstzrange

Querying

Then, to search against a datetime field with correct precision handling, the query could be constructed like this:

-- GET /Observation?date=2023-05-08T01:23Z
SELECT * FROM "Observation"
WHERE "date" && tstzrange('2023-05-08T01:23Z', '2023-05-08T01:24Z');

The && ("overlaps") operator checks if the implicit time interval specified by the Observation falls into the search range. Note that the tstzrange(start, end) function is equivalent to '[start, end)'::tstzrange, with an exclusive end point.

FHIR range search prefixes can be easily supported using unbounded ranges and different Postgres range query operators:

-- GET /Observation?date=ge2023
SELECT * FROM "Observation"
-- null end point means open range, goes up to +Infinity
WHERE "date" && tstzrange('2023-01-01', null);

-- GET /Observation?date=sa2023-05
SELECT * FROM "Observation"
-- >> (strictly to right of) range operator maps to "starts after" (`sa`) search
WHERE "date" >> tstzrange('2023-05-01', '2023-06-01');

The full mapping of FHIR search operators to Postgres query formats is given below:

FHIR search operator Postgres WHERE clause
date=2006-01-02T15:45Z (equals) "date" && '[2006-01-02T13:45:00Z, 2006-01-02T13:46:00Z)'::tstzrange
date=ne2006-01-02T15:45Z (not equal) NOT ("date" && '[2006-01-02T13:45:00Z, 2006-01-02T13:46:00Z)'::tstzrange)
date=lt2006-01-02T15:45Z (less than) "date" && '(, 2006-01-02T13:45:00Z)'::tstzrange
date=gt2006-01-02T15:45Z (greater than) "date" && '[2006-01-02T13:46:00Z, )'::tstzrange
date=le2006-01-02T15:45Z (less or equal) "date" && '(, 2006-01-02T13:46:00Z)'::tstzrange
date=ge2006-01-02T15:45Z (greater or equal) "date" && '[2006-01-02T13:45:00Z, )'::tstzrange
date=sa2006-01-02T15:45Z (starts after) "date" >> '[2006-01-02T13:45:00Z, 2006-01-02T13:46:00Z)':tstzrange
date=eb2006-01-02T15:45Z (end before) "date" << '[2006-01-02T13:45:00Z, 2006-01-02T13:46:00Z)':tstzrange

Many query patterns require computing the correct endpoint from the given search parameter. Note that in e.g. date=gt2006-01-02T15:45Z the correct range to search over is [2006-01-02T13:46:00Z, ) and NOT (2006-01-02T13:45Z, ). In this case, we want to include the first point in time after the entire minute of 1:45 PM has elapsed, not the first moment after exactly 1:45:00 PM. This requires knowing the precision/step for the search parameter value (e.g. 'minute' for the example above) and adding one step to the value. This could be handled in server code, or we could potentially have Postgres do the date math for us using the following syntax (which may be preferable since it is already timezone-aware):

"date" + interval '1 minute'

Sort order

It's possible to use the tstzrange column to sort the results using ORDER BY: range values are compared for sorting first using the start point of the range, and then the end point if the start points are equal. For example, the following values would be returned in order by ORDER BY "date" desc:

id content date
3a3a4177-9931-4489-a389-e09f00e372b9 {"resourceType":"Observation","id":"3a3a4177-9931-4489-a389-e09f00e372b9","status":"final","effectiveDateTime":"2023-06-12T18:24:30Z",[...]"} ["2023-06-12 18:24:30+00","2023-06-12 18:24:31+00")
a54d428e-a6b4-423a-bfbc-d9fa5d1d475a {"resourceType":"Observation","id":"a54d428e-a6b4-423a-bfbc-d9fa5d1d475a","status":"final","effectiveDateTime":"2023-06-12T18:24Z",[...]"} ["2023-06-12 18:24:00+00","2023-06-12 18:25:00+00")
8cb35f31-819c-4292-bd43-b0e5189bfc5e {"resourceType":"Observation","id":"8cb35f31-819c-4292-bd43-b0e5189bfc5e","status":"final","effectiveDateTime":"2023-05-10T15:20:25Z",[...]"} ["2023-05-10 15:20:25+00","2023-05-10 15:20:26+00")
e554746b-9b4c-45b1-8f10-4a326a4911cc {"resourceType":"Observation","id":"e554746b-9b4c-45b1-8f10-4a326a4911cc","status":"final","effectiveDateTime":"2023-04-08",[...]"} ["2023-04-08 00:00:00+00","2023-04-09 00:00:00+00")

Working example

An example of the proposed schema is available as a DB Fiddle, and allows testing against a small set of sample data

Migration

In order to use the new column type, we will need to migrate all existing data into the new format. There does not appear to be an obvious way to handle the complex transformation from a single date or timestamptz value to a tstzrange in a database migration script, so the transition to the new format will likely need to be performed as a series of steps:

  1. Author and execute a database migration to add new {parameterName}_range columns for each search parameter of type date, e.g.:
    ALTER TABLE "Observation" ADD COLUMN "date_range" tstzrange;
  2. Add logic to compute the correct range and populate the new column when resources are written to the server
  3. Use a script to populate the new tstzrange columns from the source FHIR resource data, which contains the datetime field with its original precision
  4. Start using the new column for date search parameters in the server
  5. Run a clean-up migration to remove the old column
    1. NOTE: If we want to maintain the old names (e.g. "date") for the new columns, we'll need another set of migrations to safely rename the column

Considered alternatives

Composite type

To represent a datetime range in Postgres, we could create a simple composite type to store the endpoints together:

CREATE TYPE timestampRange AS (
	low     timestamp with time zone
	high    timestamp with time zone
);

This allows writing relatively intuitive queries like the following:

-- GET /Observation?date=2023-03
WHERE "Observation".date.low >= '2023-03-01T00:00:00Z' AND "Observation".date.high < '2023-04-01T00:00:00Z'

However, we should prefer the built-in Postgres range type, since that will be better-optimized for use with common index types.

Future extensions

  • Similar search logic using ranges can be used to handle number search precision
  • Postgres 14 supports multiranges, which would allow us to easily represent and query against collections of disjoint ranges
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment