Skip to content

Instantly share code, notes, and snippets.

@vjt
Last active November 25, 2020 18:45
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save vjt/2563820 to your computer and use it in GitHub Desktop.
Save vjt/2563820 to your computer and use it in GitHub Desktop.
[POC] Temporal database system on PostgreSQL with Active Record in mind

This has become a Ruby gem: https://github.com/ifad/chronomodel

A temporal database system on PostgreSQL using table inheritance and the rule system.

This is a data structure for a Slowly-Changing Dimension Type 2 temporal database, implemented using only PostgreSQL >= 9.0 features.

Any application code is completely unaware of the temporal features: queries are done against a view that behaves exactly like a plain table (it can be SELECTed, UPDATEd, INSERTed INTO and DELETEd FROM), but behind the scenes the database redirects the queries to backend tables holding actual data, using the PostgreSQL rule system.

All data is stored both in a current table and in an history one, inheriting from the current. To query historical data at a given date, a where date between valid_from and valid_to clause is enough.

The current and history tables are created in different schemas, while the view is in the default public schema: the application will see only the view by default. By changing the schema search path it is possible to redirect queries to the history tables without changing the application code, only by adding the aforementioned WHERE clause. Moreover, this allows to do JOINs between history tables and non-temporal ones.

Caveat: tre rules must be kept in sync with the schema, and updated if it changes.

If your ORM / Framework supports database migrations, and your language supports monkeypatching or inheritance, this means that you'll have to override its migration DSL to do tables and views creation and rules housekeeping.

E.g. for Ruby on Rails, add_column and friends could be hooked to look whether a table is temporal and update the rules as well, while create_table and friends can create the schemas and current and history tables e.g. by defining a new option e.g. :temporal => true.

----------------------------
-- Temporal countries schema
-- vjt@openssl.it
--
create schema temporal; -- schema containing all temporal tables
create schema history; -- schema containing all history tables
-- Current countries data - nothing special
--
create table temporal.countries (
id serial primary key,
name varchar
);
-- Countries historical data.
--
-- Inheritance is used to avoid duplicating the schema from the main table.
-- Please note that columns on the main table cannot be dropped, and other caveats
-- http://www.postgresql.org/docs/9.0/static/ddl-inherit.html#DDL-INHERIT-CAVEATS
--
create table history.countries (
hid serial primary key,
valid_from timestamp not null,
valid_to timestamp not null default '9999-12-31',
recorded_at timestamp not null default now(),
constraint from_before_to check (valid_from < valid_to),
constraint overlapping_times exclude using gist (
box(
point( extract( epoch from valid_from), id ),
point( extract( epoch from valid_to - interval '1 millisecond'), id )
) with &&
)
) inherits ( temporal.countries );
create index timestamps on history.countries using btree ( valid_from, valid_to ) with ( fillfactor = 100 );
create index country_id on history.countries using btree ( id ) with ( fillfactor = 90 );
-- The countries view, what the Rails' application ORM will actually CRUD on, and
-- the core of the temporal updates.
--
-- SELECT - return only current data
--
create view public.countries as select * from only temporal.countries;
-- INSERT - insert data both in the current data table and in the history table
--
create rule countries_ins as on insert to public.countries do instead (
insert into temporal.countries ( name )
values ( new.name )
returning temporal.countries.*;
insert into history.countries ( id, name, valid_from )
values ( currval('temporal.countries_id_seq'), new.name, now() )
);
-- UPDATE - set the last history entry validity to now, save the current data in
-- a new history entry and update the current table with the new data.
--
create rule countries_upd as on update to countries do instead (
update history.countries
set valid_to = now()
where id = old.id and valid_to = '9999-12-31';
insert into history.countries ( id, name, valid_from )
values ( old.id, new.name, now() );
update only temporal.countries
set name = new.name
where id = old.id
);
-- DELETE - save the current data in the history and eventually delete the data
-- from the current table.
--
create rule countries_del as on delete to countries do instead (
update history.countries
set valid_to = now()
where id = old.id and valid_to = '9999-12-31';
delete from only temporal.countries
where temporal.countries.id = old.id
);
-- EOF
@massimiliano-della-rovere

In the WHERE clause of the UPDATE statement in the countries_upd and countries_del RULEs you are using two fields:

  • the "id" column is indexed by the "country_id" INDEX, so the index is used for searching, but
  • the "valid_to" column can only be used as a filter predicate instead of an access predicate because the timestamps INDEX starts with the "valid_from" column that does not appear in the WHERE section.

In case you have many rows in the "history.countries" TABLE sharing the same "id" value:

  • having a ("id", "valid_to") index could speed up the searching;
  • also timestamp-based partitioning by the "valid_from" column may help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment