mattwiller/date-search.md Secret

## date-search.md

      
    Raw
  

              date-search.md
            
          
    FHIR date Search with Postgres


Goals
Current state
Proposed solution

Indexing
Querying

Sort order


Working example
Migration


Considered alternatives

Composite type


Future extensions

Goals


Support filtering and sorting functionality on all FHIR datetime data types
Perform datetime search efficiently, ideally in a single query without requiring significant server code

Current state


Search parameters with date type stored in Postgres as either date or timestamp with time zone, depending on underlying field type

Not ideal: some search parameters could operate other either type


No support for some datetime types, e.g. Period
Search filters are constructed naively, e.g.
-- GET /Observation?date=gt2023-03
SELECT * FROM "Observation"
WHERE "date" > '2023-03'

Currently causes a syntax error when searching for imprecise datetimes as shown above, since '2023-03' isn't a valid Postgres datetime


Proposed solution

To account for all datetime types in search, we could store the search parameter value explicitly in Postgres as a range.  This aligns with how the specification describes date search parameters:


Data type
Description


date
The range of the value is the day, month, or year as specified


dateTime
Similar to date


instant
An instant is considered a fixed point in time with an interval smaller than the precision of the system, i.e. an interval with an effective width of 0


Period
Explicit, though the upper or lower bound might not actually be specified in resources


Timing
specified scheduling details are ignored and only the outer limits matter


Note that all five data types are ultimately represented as a range, even instant.  Aligning the values stored in Postgres with this representation will enable us to perform search queries directly against the database.
Postgres offers built-in range types over timestamp values that should allow us to easily store the search value in the correct format for querying.  Specifically, the tstzrange type should give us the flexibility we need to specify and query all kinds of date ranges, plus many helpful functions for working with ranges.  In order to make the most out of the column and achieve best performance when searching, we should consider adding a GIST index on important columns.
To illustrate how this column type would work, consider the following Observation resource:
{
  "resourceType": "Observation",
  "id": "b322be3a-2b05-4f81-8eed-75f121c9ae52",
  "meta": {
	  "lastUpdated": "2023-05-08T12:34:56Z",
  },
  "status": "final",
  "code": {
	  "coding": [{
		  "system": "http://loinc.org",
		  "code": "49222-3",
		  "display": "Peanut allergy test"
	  }]
  },
  "effectiveDateTime": "2023-05-08",
}
Note that the Observation:date search parameter is expected to search against the partially-specified effectiveDateTime field.
Indexing

Creating the resource shown above should result in a Postgres write like:
-- PUT /Observation/b322be3a-2b05-4f81-8eed-75f121c9ae52
INSERT INTO "Observation" (id, content, "lastUpdated", compartments, "date") VALUES
(
	'b322be3a-2b05-4f81-8eed-75f121c9ae52',
	'{"resourceType":"Observation","id":"b322[...]}',
	'2023-05-08T12:34:56Z',
	'{}',
	'[2023-05-08, 2023-05-09)'::tstzrange
);
Postgres is able to automatically convert some partially specified datetimes into the full form: yyyy-mm-dd and yyyy-mm-ddThh:mmZ are accepted in addition to the full yyyy-mm-ddThh:mm:ssZ (note that for date-only endpoints, UTC time zone is assumed).  Omitted parts are assumed to be zero.
Examples of the translation of FHIR datetime data types into equivalent Postgres tstzrange values are given below:


FHIR
Postgres


'2023' (date)
'[2023-01-01, 2024-01-01)'::tstzrange


'2023-02' (date)
'[2023-02-01, 2023-03-01)'::tstzrange


'2023-03-31' (dateTime)
'[2023-03-31, 2023-04-01)'::tstzrange


'2023-04-01T12:34Z' (dateTime)
'[2023-04-01T12:34:00Z, 2023-04-01T12:34:35)'::tstzrange


'2023-04-01T12:34:56Z' (dateTime)
'[2023-04-01T12:34:56Z, 2023-04-01T12:34:57Z)'::tstzrange


'2023-04-01T12:34:56Z' (instant)
'[2023-04-01T12:34:56Z, 2023-04-01T12:34:56Z]'::tstzrange


{start: '2023-01', end: '2023-05-01'} (Period)
'[2023-01-01, 2023-05-02)'::tstzrange


{start: '2023-05-01T12:34:56Z', end: '2023-05-01T12:35:00Z'} (Period)
'[2023-05-01T12:34:56Z, 2023-05-01T12:34:35:01Z)'::tstzrange


Querying

Then, to search against a datetime field with correct precision handling, the query could be constructed like this:
-- GET /Observation?date=2023-05-08T01:23Z
SELECT * FROM "Observation"
WHERE "date" && tstzrange('2023-05-08T01:23Z', '2023-05-08T01:24Z');
The && ("overlaps") operator checks if the implicit time interval specified by the Observation falls into the search range.  Note that the tstzrange(start, end) function is equivalent to '[start, end)'::tstzrange, with an exclusive end point.
FHIR range search prefixes can be easily supported using unbounded ranges and different Postgres range query operators:
-- GET /Observation?date=ge2023
SELECT * FROM "Observation"
-- null end point means open range, goes up to +Infinity
WHERE "date" && tstzrange('2023-01-01', null);

-- GET /Observation?date=sa2023-05
SELECT * FROM "Observation"
-- >> (strictly to right of) range operator maps to "starts after" (`sa`) search
WHERE "date" >> tstzrange('2023-05-01', '2023-06-01');
The full mapping of FHIR search operators to Postgres query formats is given below:


FHIR search operator
Postgres WHERE clause


date=2006-01-02T15:45Z (equals)
"date" && '[2006-01-02T13:45:00Z, 2006-01-02T13:46:00Z)'::tstzrange


date=ne2006-01-02T15:45Z (not equal)
NOT ("date" && '[2006-01-02T13:45:00Z, 2006-01-02T13:46:00Z)'::tstzrange)


date=lt2006-01-02T15:45Z (less than)
"date" && '(, 2006-01-02T13:45:00Z)'::tstzrange


date=gt2006-01-02T15:45Z (greater than)
"date" && '[2006-01-02T13:46:00Z, )'::tstzrange


date=le2006-01-02T15:45Z (less or equal)
"date" && '(, 2006-01-02T13:46:00Z)'::tstzrange


date=ge2006-01-02T15:45Z (greater or equal)
"date" && '[2006-01-02T13:45:00Z, )'::tstzrange


date=sa2006-01-02T15:45Z (starts after)
"date" >> '[2006-01-02T13:45:00Z, 2006-01-02T13:46:00Z)':tstzrange


date=eb2006-01-02T15:45Z (end before)
"date" << '[2006-01-02T13:45:00Z, 2006-01-02T13:46:00Z)':tstzrange


Many query patterns require computing the correct endpoint from the given search parameter.  Note that in e.g. date=gt2006-01-02T15:45Z the correct range to search over is [2006-01-02T13:46:00Z, ) and NOT (2006-01-02T13:45Z, ).  In this case, we want to include the first point in time after the entire minute of 1:45 PM has elapsed, not the first moment after exactly 1:45:00 PM.  This requires knowing the precision/step for the search parameter value (e.g. 'minute' for the example above) and adding one step to the value.  This could be handled in server code, or we could potentially have Postgres do the date math for us using the following syntax (which may be preferable since it is already timezone-aware):
"date" + interval '1 minute'
Sort order

It's possible to use the tstzrange column to sort the results using ORDER BY: range values are compared for sorting first using the start point of the range, and then the end point if the start points are equal.  For example, the following values would be returned in order by ORDER BY "date" desc:


id
content
date


3a3a4177-9931-4489-a389-e09f00e372b9
{"resourceType":"Observation","id":"3a3a4177-9931-4489-a389-e09f00e372b9","status":"final","effectiveDateTime":"2023-06-12T18:24:30Z",[...]"}
["2023-06-12 18:24:30+00","2023-06-12 18:24:31+00")


a54d428e-a6b4-423a-bfbc-d9fa5d1d475a
{"resourceType":"Observation","id":"a54d428e-a6b4-423a-bfbc-d9fa5d1d475a","status":"final","effectiveDateTime":"2023-06-12T18:24Z",[...]"}
["2023-06-12 18:24:00+00","2023-06-12 18:25:00+00")


8cb35f31-819c-4292-bd43-b0e5189bfc5e
{"resourceType":"Observation","id":"8cb35f31-819c-4292-bd43-b0e5189bfc5e","status":"final","effectiveDateTime":"2023-05-10T15:20:25Z",[...]"}
["2023-05-10 15:20:25+00","2023-05-10 15:20:26+00")


e554746b-9b4c-45b1-8f10-4a326a4911cc
{"resourceType":"Observation","id":"e554746b-9b4c-45b1-8f10-4a326a4911cc","status":"final","effectiveDateTime":"2023-04-08",[...]"}
["2023-04-08 00:00:00+00","2023-04-09 00:00:00+00")


Working example

An example of the proposed schema is available as a DB Fiddle, and allows testing against a small set of sample data
Migration

In order to use the new column type, we will need to migrate all existing data into the new format.  There does not appear to be an obvious way to handle the complex transformation from a single date or timestamptz value to a tstzrange in a database migration script, so the transition to the new format will likely need to be performed as a series of steps:

Author and execute a database migration to add new {parameterName}_range columns for each search parameter of type date, e.g.:
ALTER TABLE "Observation" ADD COLUMN "date_range" tstzrange;

Add logic to compute the correct range and populate the new column when resources are written to the server
Use a script to populate the new tstzrange columns from the source FHIR resource data, which contains the datetime field with its original precision
Start using the new column for date search parameters in the server
Run a clean-up migration to remove the old column

NOTE: If we want to maintain the old names (e.g. "date") for the new columns, we'll need another set of migrations to safely rename the column


Considered alternatives

Composite type

To represent a datetime range in Postgres, we could create a simple composite type to store the endpoints together:
CREATE TYPE timestampRange AS (
	low     timestamp with time zone
	high    timestamp with time zone
);
This allows writing relatively intuitive queries like the following:
-- GET /Observation?date=2023-03
WHERE "Observation".date.low >= '2023-03-01T00:00:00Z' AND "Observation".date.high < '2023-04-01T00:00:00Z'
However, we should prefer the built-in Postgres range type, since that will be better-optimized for use with common index types.
Future extensions


Similar search logic using ranges can be used to handle number search precision
Postgres 14 supports multiranges, which would allow us to easily represent and query against collections of disjoint ranges
Data type	Description
date	The range of the value is the day, month, or year as specified
dateTime	Similar to date
instant	An instant is considered a fixed point in time with an interval smaller than the precision of the system, i.e. an interval with an effective width of 0
Period	Explicit, though the upper or lower bound might not actually be specified in resources
Timing	specified scheduling details are ignored and only the outer limits matter
FHIR	Postgres
`'2023'` (date)	`'[2023-01-01, 2024-01-01)'::tstzrange`
`'2023-02'` (date)	`'[2023-02-01, 2023-03-01)'::tstzrange`
`'2023-03-31'` (dateTime)	`'[2023-03-31, 2023-04-01)'::tstzrange`
`'2023-04-01T12:34Z'` (dateTime)	`'[2023-04-01T12:34:00Z, 2023-04-01T12:34:35)'::tstzrange`
`'2023-04-01T12:34:56Z'` (dateTime)	`'[2023-04-01T12:34:56Z, 2023-04-01T12:34:57Z)'::tstzrange`
`'2023-04-01T12:34:56Z'` (instant)	`'[2023-04-01T12:34:56Z, 2023-04-01T12:34:56Z]'::tstzrange`
`{start: '2023-01', end: '2023-05-01'}` (Period)	`'[2023-01-01, 2023-05-02)'::tstzrange`
`{start: '2023-05-01T12:34:56Z', end: '2023-05-01T12:35:00Z'}` (Period)	`'[2023-05-01T12:34:56Z, 2023-05-01T12:34:35:01Z)'::tstzrange`
FHIR search operator	Postgres `WHERE` clause
`date=2006-01-02T15:45Z` (equals)	`"date" && '[2006-01-02T13:45:00Z, 2006-01-02T13:46:00Z)'::tstzrange`
`date=ne2006-01-02T15:45Z` (not equal)	`NOT ("date" && '[2006-01-02T13:45:00Z, 2006-01-02T13:46:00Z)'::tstzrange)`
`date=lt2006-01-02T15:45Z` (less than)	`"date" && '(, 2006-01-02T13:45:00Z)'::tstzrange`
`date=gt2006-01-02T15:45Z` (greater than)	`"date" && '[2006-01-02T13:46:00Z, )'::tstzrange`
`date=le2006-01-02T15:45Z` (less or equal)	`"date" && '(, 2006-01-02T13:46:00Z)'::tstzrange`
`date=ge2006-01-02T15:45Z` (greater or equal)	`"date" && '[2006-01-02T13:45:00Z, )'::tstzrange`
`date=sa2006-01-02T15:45Z` (starts after)	`"date" >> '[2006-01-02T13:45:00Z, 2006-01-02T13:46:00Z)':tstzrange`
`date=eb2006-01-02T15:45Z` (end before)	`"date" << '[2006-01-02T13:45:00Z, 2006-01-02T13:46:00Z)':tstzrange`
id	content	date
`3a3a4177-9931-4489-a389-e09f00e372b9`	`{"resourceType":"Observation","id":"3a3a4177-9931-4489-a389-e09f00e372b9","status":"final","effectiveDateTime":"2023-06-12T18:24:30Z",[...]"}`	`["2023-06-12 18:24:30+00","2023-06-12 18:24:31+00")`
`a54d428e-a6b4-423a-bfbc-d9fa5d1d475a`	`{"resourceType":"Observation","id":"a54d428e-a6b4-423a-bfbc-d9fa5d1d475a","status":"final","effectiveDateTime":"2023-06-12T18:24Z",[...]"}`	`["2023-06-12 18:24:00+00","2023-06-12 18:25:00+00")`
`8cb35f31-819c-4292-bd43-b0e5189bfc5e`	`{"resourceType":"Observation","id":"8cb35f31-819c-4292-bd43-b0e5189bfc5e","status":"final","effectiveDateTime":"2023-05-10T15:20:25Z",[...]"}`	`["2023-05-10 15:20:25+00","2023-05-10 15:20:26+00")`
`e554746b-9b4c-45b1-8f10-4a326a4911cc`	`{"resourceType":"Observation","id":"e554746b-9b4c-45b1-8f10-4a326a4911cc","status":"final","effectiveDateTime":"2023-04-08",[...]"}`	`["2023-04-08 00:00:00+00","2023-04-09 00:00:00+00")`