vinayvenu/multitenancy_and_reporting_evolution.sql

## multitenancy_and_reporting_evolution.sql
-- This script lets you follow the first session of the Avni Sessions series. It is accompanied by the presentation at https://docs.google.com/presentation/d/1j6CH9xsBJ2QKt4B3JhJeklBrL5Ju1dPNYjsMVufZ1yk/edit?usp=sharing

------------------------------------
-- MULTITENANCY IN AVNI - START
------------------------------------

-- Multitenancy. We use Postgres Row-level multitenancy (https://www.postgresql.org/docs/current/ddl-rowsecurity.html) to prevent organisations from using each other's data. In Avni, we have a multi-level inheritance style structure that is enabled through a hierarchy of organisations. This hierarchy has two mechanisms.

-- Multitenancy mechanism 1 - Extending an organisation. We have parent and child organisations that share metadata (not data). Lets see this with an example - Calcutta Kids
set role calcutta_kids;

-- See how organisations are set up in the organisation table
select id, name, db_user, parent_organisation_id
from organisation;
-- Typically when you run this query, you should see only one row. However, we see two rows here. Notice that the parent_organisation_id of Calcutta Kids is 1, which is the id of OpenCHS. OpenCHS is the base organisation that we initially started with. We also had forms and rules that we intended to share with organisations, with each organisation having the ability to extend them. These organisations share metadata. This means that concepts, forms etc from OpenCHS is available to children. For example, see the organisations present in the concept table below.

select distinct o.name
from concept c
         inner join organisation o on c.organisation_id = o.id;

--Data, however, is not shared between the two organisations. See what happens when we run a similar query for the individual table
select distinct o.name
from individual i
         inner join organisation o on i.organisation_id = o.id;

-- This mechanism can also be used to create training organisations where metadata is shared, but data is not. Look at how the calcutta_kids_training organisation. Calcutta Kids Training inherits metadata from OpenCHS and Calcutta Kids while keeping data private
set role calcutta_kids_training;

select distinct o.name
from concept c
         inner join organisation o on c.organisation_id = o.id;

select distinct o.name
from individual i
         inner join organisation o on i.organisation_id = o.id;

-- PS: This form of multitenancy was originally meant to support common modules that can be used by multiple organisations. However, we found that this is a bad way, and have not been using this mechanism for newer organisations (since about 2019). The reasons for this are
-- 1. We noticed that although organisations have similar programmes (MCH etc), the implementations vary quite a bit, reducing the purpose of keeping metadata in a parent organisation
-- 2. It was hard to maintain changes across organisations when someone requests for it
-- 3. We moved away from OpenCHS (Open Community Health System) to Avni, whose scope is much larger than its original purpose.

-- Multitenancy mechanism 2 - Organisation groups. Here, a group of organisations can share data. This is mostly a reporting thing where each organisation will be working independently, while there is a reporting user who can access all data. This is achieved through special database users in the organisation_group table who can access data and metadata across multiple organisations. This is used by Rejuvenating Water Bodies currently. This organisation has a database user called rwb who can query data across Rejuvenating Water Bodies MH, RJ and IIT-IIT projects.
set role rwb;
select o.id, o.name, o.db_user, og.name, og.db_user
from organisation o
         inner join organisation_group_organisation ogo on o.id = ogo.organisation_id
         inner join organisation_group og on ogo.organisation_group_id = og.id;

-- Notice that data and metadata from all organisations will be available
select distinct o.name
from individual i
         inner join organisation o on i.organisation_id = o.id;

-- Notice that there will be duplicates even in the metadata. In situations like this, we have to upload the same uuids across organisations in the organisation group if we need good reporting. This is an implicit assumption used in ETL.
select c.uuid, c.name, o.name
from concept c
         inner join organisation o on c.organisation_id = o.id
where c.name = 'Gram Panchayat';

-- PS: If you want to know exactly how multitenancy is implemented in the tables of Avni, see functions, see here - https://github.com/avniproject/avni-server/blob/master/avni-server-api/src/main/resources/db/migration/V1_142.2__CreateNewRLSForAccount.sql

------------------------------------
-- MULTITENANCY IN AVNI - END
------------------------------------


------------------------------------
-- REPORTING EVOLUTION OVER THE YEARS
------------------------------------
-- Reporting solutions over time have been focusing on multiple goals (in the order of priority below)
-- 1. Avoiding errors
-- 2. Making development easier
-- 3. Running queries faster

------------------------------------
-- Version 1 - Basic views
------------------------------------
-- The first set of views were developed with the first objective in mind. These views are available in github here - https://github.com/avniproject/avni-reporting/tree/master/migrations

-- What is the purpose of these views? Consider the following set of views
-- create view completed_program_encounter_view as
-- select *
-- from program_encounter_view
-- where encounter_date_time is not null;
-- create view program_encounter_view as
-- select pe.*,
--        oet.operational_encounter_type_uuid,
--        oet.operational_encounter_type_name,
--        oet.operational_encounter_type_is_voided,
--        oet.encounter_type_uuid,
--        oet.encounter_type_name,
--        oet.encounter_type_is_voided
-- from program_encounter pe
--          join operational_encounter_type_view oet on oet.encounter_type_id = pe.encounter_type_id
-- where pe.is_voided is not true;

-- It ensures that
-- 1. program_encounter.encounter_date_time is not null (The program encounter actually took place)
-- 2. program_encounter.is_voided is false
-- This promotes correctness in our queries


------------------------------------
-- Version 1 problems
------------------------------------
-- Problem 1 - Potential Data leakage
-- Views do not adhere to Row Level Security unless schanges are made to it. This has the potential to cause data leakage, and some performance concerns.
reset role;
create or replace view my_individual_view as
(
select *
from individual);

grant SELECT on my_individual_view to jscs;

set role jscs;
select count(*)
from my_individual_view; -- 815436
select count(*)
from individual; -- 82526
-- What is going on here? Multiple organisation data showing up when selecting the view
-- Views are not really supported by RLS, so this can cause trouble
-- Note that this can be solved by wrapping the view in a function (see https://www.benburwell.com/posts/row-level-security-postgresql-views/)


-- Problem 2 - Objective 2 (Making development easier) is not yet met
set role jscs;
select observations from completed_program_encounter_view; -- This is still a blob, and is required for most queries


------------------------------------
-- Version 2 - organisation specific views
------------------------------------
-- Here, we create views for each organisation that expands the observations into their own schemas. This goes a long way into writing queries better and safer.
-- These views are programmatically generated and stored in the schema of each organisation. These are generated using the ViewGenService (https://github.com/avniproject/avni-server/blob/master/avni-server-api/src/main/java/org/avni/server/reporting/ViewGenService.java)

-- How to recognise them? Look for org_* view within the org schema. eg: jscs.

set role jscs;
-- The invididual view
select * from jscs.jscs_individual;
-- Views specific for program encounters as well are present
select id, program_enrolment_id, encounter_type_name, "Result from old sickle cell test report" from jscs.jscs_individual_sickle_cell_base_screening;
-- Notice that the encounter type name is already in there. Voided checks are present by default. Obs are available by concept name. Some performance optimizations are also in place (See https://github.com/avniproject/avni-server/blob/master/avni-server-api/src/main/resources/pivot/conceptMap.sql)

------------------------------------
-- Version 2 issues - Join performance
------------------------------------
select ji.first_name, jiscbs."Result from old sickle cell test report" from jscs.jscs_individual_sickle_cell_base_screening jiscbs
inner join jscs.jscs_individual_sickle_cell jisc on jiscbs.program_enrolment_id = jisc.id
inner join jscs.jscs_individual ji on jisc.individual_id = ji.id
where ji.first_name = 'Priti bai';

-- Why these issues?
-- 1. The views are individually doing a lot of concept lookups etc that cannot be optimized (Gin indexes do not work when using multitenancy)
-- 2. Joins are not efficient (we are actually joining multiple times to the same tables, and this is hard to change)

------------------------------------
-- Version 3 - ETL tables
------------------------------------
-- Here we move data from the public schema into tables for each organisation into their own schemas
-- Can be identified by org.* tables in the schema
-- For more details, see https://github.com/avniproject/avni-etl/


------------------------------------
-- Version 3 Issues - Incomplete. This is still a work in progress, so you might not find all tables that you need there
------------------------------------


------------------------------------
-- Version 3 Issues - Job failures and rerunning of migration
------------------------------------
-- We have had issues with jobs stopping because of some change in the metadata. These issues need to be fixed
-- Also, failures are not reported anywhere, so there is a chance

------------------------------------
-- REPORTING EVOLUTION OVER THE YEARS - END
------------------------------------
	-- This script lets you follow the first session of the Avni Sessions series. It is accompanied by the presentation at https://docs.google.com/presentation/d/1j6CH9xsBJ2QKt4B3JhJeklBrL5Ju1dPNYjsMVufZ1yk/edit?usp=sharing

	------------------------------------
	-- MULTITENANCY IN AVNI - START
	------------------------------------

	-- Multitenancy. We use Postgres Row-level multitenancy (https://www.postgresql.org/docs/current/ddl-rowsecurity.html) to prevent organisations from using each other's data. In Avni, we have a multi-level inheritance style structure that is enabled through a hierarchy of organisations. This hierarchy has two mechanisms.

	-- Multitenancy mechanism 1 - Extending an organisation. We have parent and child organisations that share metadata (not data). Lets see this with an example - Calcutta Kids
	set role calcutta_kids;

	-- See how organisations are set up in the organisation table
	select id, name, db_user, parent_organisation_id
	from organisation;
	-- Typically when you run this query, you should see only one row. However, we see two rows here. Notice that the parent_organisation_id of Calcutta Kids is 1, which is the id of OpenCHS. OpenCHS is the base organisation that we initially started with. We also had forms and rules that we intended to share with organisations, with each organisation having the ability to extend them. These organisations share metadata. This means that concepts, forms etc from OpenCHS is available to children. For example, see the organisations present in the concept table below.

	select distinct o.name
	from concept c
	inner join organisation o on c.organisation_id = o.id;

	--Data, however, is not shared between the two organisations. See what happens when we run a similar query for the individual table
	select distinct o.name
	from individual i
	inner join organisation o on i.organisation_id = o.id;

	-- This mechanism can also be used to create training organisations where metadata is shared, but data is not. Look at how the calcutta_kids_training organisation. Calcutta Kids Training inherits metadata from OpenCHS and Calcutta Kids while keeping data private
	set role calcutta_kids_training;

	select distinct o.name
	from concept c
	inner join organisation o on c.organisation_id = o.id;

	select distinct o.name
	from individual i
	inner join organisation o on i.organisation_id = o.id;

	-- PS: This form of multitenancy was originally meant to support common modules that can be used by multiple organisations. However, we found that this is a bad way, and have not been using this mechanism for newer organisations (since about 2019). The reasons for this are
	-- 1. We noticed that although organisations have similar programmes (MCH etc), the implementations vary quite a bit, reducing the purpose of keeping metadata in a parent organisation
	-- 2. It was hard to maintain changes across organisations when someone requests for it
	-- 3. We moved away from OpenCHS (Open Community Health System) to Avni, whose scope is much larger than its original purpose.

	-- Multitenancy mechanism 2 - Organisation groups. Here, a group of organisations can share data. This is mostly a reporting thing where each organisation will be working independently, while there is a reporting user who can access all data. This is achieved through special database users in the organisation_group table who can access data and metadata across multiple organisations. This is used by Rejuvenating Water Bodies currently. This organisation has a database user called rwb who can query data across Rejuvenating Water Bodies MH, RJ and IIT-IIT projects.
	set role rwb;
	select o.id, o.name, o.db_user, og.name, og.db_user
	from organisation o
	inner join organisation_group_organisation ogo on o.id = ogo.organisation_id
	inner join organisation_group og on ogo.organisation_group_id = og.id;

	-- Notice that data and metadata from all organisations will be available
	select distinct o.name
	from individual i
	inner join organisation o on i.organisation_id = o.id;

	-- Notice that there will be duplicates even in the metadata. In situations like this, we have to upload the same uuids across organisations in the organisation group if we need good reporting. This is an implicit assumption used in ETL.
	select c.uuid, c.name, o.name
	from concept c
	inner join organisation o on c.organisation_id = o.id
	where c.name = 'Gram Panchayat';

	-- PS: If you want to know exactly how multitenancy is implemented in the tables of Avni, see functions, see here - https://github.com/avniproject/avni-server/blob/master/avni-server-api/src/main/resources/db/migration/V1_142.2__CreateNewRLSForAccount.sql

	------------------------------------
	-- MULTITENANCY IN AVNI - END
	------------------------------------



	------------------------------------
	-- REPORTING EVOLUTION OVER THE YEARS
	------------------------------------
	-- Reporting solutions over time have been focusing on multiple goals (in the order of priority below)
	-- 1. Avoiding errors
	-- 2. Making development easier
	-- 3. Running queries faster

	------------------------------------
	-- Version 1 - Basic views
	------------------------------------
	-- The first set of views were developed with the first objective in mind. These views are available in github here - https://github.com/avniproject/avni-reporting/tree/master/migrations

	-- What is the purpose of these views? Consider the following set of views
	-- create view completed_program_encounter_view as
	-- select *
	-- from program_encounter_view
	-- where encounter_date_time is not null;
	-- create view program_encounter_view as
	-- select pe.*,
	-- oet.operational_encounter_type_uuid,
	-- oet.operational_encounter_type_name,
	-- oet.operational_encounter_type_is_voided,
	-- oet.encounter_type_uuid,
	-- oet.encounter_type_name,
	-- oet.encounter_type_is_voided
	-- from program_encounter pe
	-- join operational_encounter_type_view oet on oet.encounter_type_id = pe.encounter_type_id
	-- where pe.is_voided is not true;

	-- It ensures that
	-- 1. program_encounter.encounter_date_time is not null (The program encounter actually took place)
	-- 2. program_encounter.is_voided is false
	-- This promotes correctness in our queries


	------------------------------------
	-- Version 1 problems
	------------------------------------
	-- Problem 1 - Potential Data leakage
	-- Views do not adhere to Row Level Security unless schanges are made to it. This has the potential to cause data leakage, and some performance concerns.
	reset role;
	create or replace view my_individual_view as
	(
	select *
	from individual);

	grant SELECT on my_individual_view to jscs;

	set role jscs;
	select count(*)
	from my_individual_view; -- 815436
	select count(*)
	from individual; -- 82526
	-- What is going on here? Multiple organisation data showing up when selecting the view
	-- Views are not really supported by RLS, so this can cause trouble
	-- Note that this can be solved by wrapping the view in a function (see https://www.benburwell.com/posts/row-level-security-postgresql-views/)


	-- Problem 2 - Objective 2 (Making development easier) is not yet met
	set role jscs;
	select observations from completed_program_encounter_view; -- This is still a blob, and is required for most queries


	------------------------------------
	-- Version 2 - organisation specific views
	------------------------------------
	-- Here, we create views for each organisation that expands the observations into their own schemas. This goes a long way into writing queries better and safer.
	-- These views are programmatically generated and stored in the schema of each organisation. These are generated using the ViewGenService (https://github.com/avniproject/avni-server/blob/master/avni-server-api/src/main/java/org/avni/server/reporting/ViewGenService.java)

	-- How to recognise them? Look for org_* view within the org schema. eg: jscs.

	set role jscs;
	-- The invididual view
	select * from jscs.jscs_individual;
	-- Views specific for program encounters as well are present
	select id, program_enrolment_id, encounter_type_name, "Result from old sickle cell test report" from jscs.jscs_individual_sickle_cell_base_screening;
	-- Notice that the encounter type name is already in there. Voided checks are present by default. Obs are available by concept name. Some performance optimizations are also in place (See https://github.com/avniproject/avni-server/blob/master/avni-server-api/src/main/resources/pivot/conceptMap.sql)

	------------------------------------
	-- Version 2 issues - Join performance
	------------------------------------
	select ji.first_name, jiscbs."Result from old sickle cell test report" from jscs.jscs_individual_sickle_cell_base_screening jiscbs
	inner join jscs.jscs_individual_sickle_cell jisc on jiscbs.program_enrolment_id = jisc.id
	inner join jscs.jscs_individual ji on jisc.individual_id = ji.id
	where ji.first_name = 'Priti bai';

	-- Why these issues?
	-- 1. The views are individually doing a lot of concept lookups etc that cannot be optimized (Gin indexes do not work when using multitenancy)
	-- 2. Joins are not efficient (we are actually joining multiple times to the same tables, and this is hard to change)

	------------------------------------
	-- Version 3 - ETL tables
	------------------------------------
	-- Here we move data from the public schema into tables for each organisation into their own schemas
	-- Can be identified by org.* tables in the schema
	-- For more details, see https://github.com/avniproject/avni-etl/


	------------------------------------
	-- Version 3 Issues - Incomplete. This is still a work in progress, so you might not find all tables that you need there
	------------------------------------


	------------------------------------
	-- Version 3 Issues - Job failures and rerunning of migration
	------------------------------------
	-- We have had issues with jobs stopping because of some change in the metadata. These issues need to be fixed
	-- Also, failures are not reported anywhere, so there is a chance

	------------------------------------
	-- REPORTING EVOLUTION OVER THE YEARS - END
	------------------------------------