- Support filtering and sorting functionality on all FHIR datetime data types
- Perform datetime search efficiently, ideally in a single query without requiring significant server code
- Search parameters with
date
type stored in Postgres as eitherdate
ortimestamp with time zone
, depending on underlying field type- Not ideal: some search parameters could operate other either type
- No support for some datetime types, e.g.
Period
- Search filters are constructed naively, e.g.
-- GET /Observation?date=gt2023-03 SELECT * FROM "Observation" WHERE "date" > '2023-03'
- Currently causes a syntax error when searching for imprecise datetimes as shown above, since
'2023-03'
isn't a valid Postgres datetime
- Currently causes a syntax error when searching for imprecise datetimes as shown above, since
To account for all datetime types in search, we could store the search parameter value explicitly in Postgres as a range. This aligns with how the specification describes date
search parameters:
Data type | Description |
---|---|
date | The range of the value is the day, month, or year as specified |
dateTime | Similar to date |
instant | An instant is considered a fixed point in time with an interval smaller than the precision of the system, i.e. an interval with an effective width of 0 |
Period | Explicit, though the upper or lower bound might not actually be specified in resources |
Timing | specified scheduling details are ignored and only the outer limits matter |
Note that all five data types are ultimately represented as a range, even instant
. Aligning the values stored in Postgres with this representation will enable us to perform search queries directly against the database.
Postgres offers built-in range types over timestamp values that should allow us to easily store the search value in the correct format for querying. Specifically, the tstzrange
type should give us the flexibility we need to specify and query all kinds of date ranges, plus many helpful functions for working with ranges. In order to make the most out of the column and achieve best performance when searching, we should consider adding a GIST
index on important columns.
To illustrate how this column type would work, consider the following Observation
resource:
{
"resourceType": "Observation",
"id": "b322be3a-2b05-4f81-8eed-75f121c9ae52",
"meta": {
"lastUpdated": "2023-05-08T12:34:56Z",
},
"status": "final",
"code": {
"coding": [{
"system": "http://loinc.org",
"code": "49222-3",
"display": "Peanut allergy test"
}]
},
"effectiveDateTime": "2023-05-08",
}
Note that the Observation:date
search parameter is expected to search against the partially-specified effectiveDateTime
field.
Creating the resource shown above should result in a Postgres write like:
-- PUT /Observation/b322be3a-2b05-4f81-8eed-75f121c9ae52
INSERT INTO "Observation" (id, content, "lastUpdated", compartments, "date") VALUES
(
'b322be3a-2b05-4f81-8eed-75f121c9ae52',
'{"resourceType":"Observation","id":"b322[...]}',
'2023-05-08T12:34:56Z',
'{}',
'[2023-05-08, 2023-05-09)'::tstzrange
);
Postgres is able to automatically convert some partially specified datetimes into the full form: yyyy-mm-dd
and yyyy-mm-ddThh:mmZ
are accepted in addition to the full yyyy-mm-ddThh:mm:ssZ
(note that for date-only endpoints, UTC time zone is assumed). Omitted parts are assumed to be zero.
Examples of the translation of FHIR datetime data types into equivalent Postgres tstzrange
values are given below:
FHIR | Postgres |
---|---|
'2023' (date) |
'[2023-01-01, 2024-01-01)'::tstzrange |
'2023-02' (date) |
'[2023-02-01, 2023-03-01)'::tstzrange |
'2023-03-31' (dateTime) |
'[2023-03-31, 2023-04-01)'::tstzrange |
'2023-04-01T12:34Z' (dateTime) |
'[2023-04-01T12:34:00Z, 2023-04-01T12:34:35)'::tstzrange |
'2023-04-01T12:34:56Z' (dateTime) |
'[2023-04-01T12:34:56Z, 2023-04-01T12:34:57Z)'::tstzrange |
'2023-04-01T12:34:56Z' (instant) |
'[2023-04-01T12:34:56Z, 2023-04-01T12:34:56Z]'::tstzrange |
{start: '2023-01', end: '2023-05-01'} (Period) |
'[2023-01-01, 2023-05-02)'::tstzrange |
{start: '2023-05-01T12:34:56Z', end: '2023-05-01T12:35:00Z'} (Period) |
'[2023-05-01T12:34:56Z, 2023-05-01T12:34:35:01Z)'::tstzrange |
Then, to search against a datetime field with correct precision handling, the query could be constructed like this:
-- GET /Observation?date=2023-05-08T01:23Z
SELECT * FROM "Observation"
WHERE "date" && tstzrange('2023-05-08T01:23Z', '2023-05-08T01:24Z');
The &&
("overlaps") operator checks if the implicit time interval specified by the Observation falls into the search range. Note that the tstzrange(start, end)
function is equivalent to '[start, end)'::tstzrange
, with an exclusive end point.
FHIR range search prefixes can be easily supported using unbounded ranges and different Postgres range query operators:
-- GET /Observation?date=ge2023
SELECT * FROM "Observation"
-- null end point means open range, goes up to +Infinity
WHERE "date" && tstzrange('2023-01-01', null);
-- GET /Observation?date=sa2023-05
SELECT * FROM "Observation"
-- >> (strictly to right of) range operator maps to "starts after" (`sa`) search
WHERE "date" >> tstzrange('2023-05-01', '2023-06-01');
The full mapping of FHIR search operators to Postgres query formats is given below:
FHIR search operator | Postgres WHERE clause |
---|---|
date=2006-01-02T15:45Z (equals) |
"date" && '[2006-01-02T13:45:00Z, 2006-01-02T13:46:00Z)'::tstzrange |
date=ne2006-01-02T15:45Z (not equal) |
NOT ("date" && '[2006-01-02T13:45:00Z, 2006-01-02T13:46:00Z)'::tstzrange) |
date=lt2006-01-02T15:45Z (less than) |
"date" && '(, 2006-01-02T13:45:00Z)'::tstzrange |
date=gt2006-01-02T15:45Z (greater than) |
"date" && '[2006-01-02T13:46:00Z, )'::tstzrange |
date=le2006-01-02T15:45Z (less or equal) |
"date" && '(, 2006-01-02T13:46:00Z)'::tstzrange |
date=ge2006-01-02T15:45Z (greater or equal) |
"date" && '[2006-01-02T13:45:00Z, )'::tstzrange |
date=sa2006-01-02T15:45Z (starts after) |
"date" >> '[2006-01-02T13:45:00Z, 2006-01-02T13:46:00Z)':tstzrange |
date=eb2006-01-02T15:45Z (end before) |
"date" << '[2006-01-02T13:45:00Z, 2006-01-02T13:46:00Z)':tstzrange |
Many query patterns require computing the correct endpoint from the given search parameter. Note that in e.g. date=gt2006-01-02T15:45Z
the correct range to search over is [2006-01-02T13:46:00Z, )
and NOT (2006-01-02T13:45Z, )
. In this case, we want to include the first point in time after the entire minute of 1:45 PM has elapsed, not the first moment after exactly 1:45:00 PM. This requires knowing the precision/step for the search parameter value (e.g. 'minute' for the example above) and adding one step to the value. This could be handled in server code, or we could potentially have Postgres do the date math for us using the following syntax (which may be preferable since it is already timezone-aware):
"date" + interval '1 minute'
It's possible to use the tstzrange
column to sort the results using ORDER BY
: range values are compared for sorting first using the start point of the range, and then the end point if the start points are equal. For example, the following values would be returned in order by ORDER BY "date" desc
:
id | content | date |
---|---|---|
3a3a4177-9931-4489-a389-e09f00e372b9 |
{"resourceType":"Observation","id":"3a3a4177-9931-4489-a389-e09f00e372b9","status":"final","effectiveDateTime":"2023-06-12T18:24:30Z",[...]"} |
["2023-06-12 18:24:30+00","2023-06-12 18:24:31+00") |
a54d428e-a6b4-423a-bfbc-d9fa5d1d475a |
{"resourceType":"Observation","id":"a54d428e-a6b4-423a-bfbc-d9fa5d1d475a","status":"final","effectiveDateTime":"2023-06-12T18:24Z",[...]"} |
["2023-06-12 18:24:00+00","2023-06-12 18:25:00+00") |
8cb35f31-819c-4292-bd43-b0e5189bfc5e |
{"resourceType":"Observation","id":"8cb35f31-819c-4292-bd43-b0e5189bfc5e","status":"final","effectiveDateTime":"2023-05-10T15:20:25Z",[...]"} |
["2023-05-10 15:20:25+00","2023-05-10 15:20:26+00") |
e554746b-9b4c-45b1-8f10-4a326a4911cc |
{"resourceType":"Observation","id":"e554746b-9b4c-45b1-8f10-4a326a4911cc","status":"final","effectiveDateTime":"2023-04-08",[...]"} |
["2023-04-08 00:00:00+00","2023-04-09 00:00:00+00") |
An example of the proposed schema is available as a DB Fiddle, and allows testing against a small set of sample data
In order to use the new column type, we will need to migrate all existing data into the new format. There does not appear to be an obvious way to handle the complex transformation from a single date
or timestamptz
value to a tstzrange
in a database migration script, so the transition to the new format will likely need to be performed as a series of steps:
- Author and execute a database migration to add new
{parameterName}_range
columns for each search parameter of typedate
, e.g.:ALTER TABLE "Observation" ADD COLUMN "date_range" tstzrange;
- Add logic to compute the correct range and populate the new column when resources are written to the server
- Use a script to populate the new
tstzrange
columns from the source FHIR resource data, which contains the datetime field with its original precision - Start using the new column for
date
search parameters in the server - Run a clean-up migration to remove the old column
- NOTE: If we want to maintain the old names (e.g.
"date"
) for the new columns, we'll need another set of migrations to safely rename the column
- NOTE: If we want to maintain the old names (e.g.
To represent a datetime range in Postgres, we could create a simple composite type to store the endpoints together:
CREATE TYPE timestampRange AS (
low timestamp with time zone
high timestamp with time zone
);
This allows writing relatively intuitive queries like the following:
-- GET /Observation?date=2023-03
WHERE "Observation".date.low >= '2023-03-01T00:00:00Z' AND "Observation".date.high < '2023-04-01T00:00:00Z'
However, we should prefer the built-in Postgres range type, since that will be better-optimized for use with common index types.
- Similar search logic using ranges can be used to handle number search precision
- Postgres 14 supports multiranges, which would allow us to easily represent and query against collections of disjoint ranges