Skip to content

Instantly share code, notes, and snippets.

@iperdomo
Last active May 7, 2020 03:08
Show Gist options
  • Save iperdomo/a037ebc343a5c53f6db3481ba41f1215 to your computer and use it in GitHub Desktop.
Save iperdomo/a037ebc343a5c53f6db3481ba41f1215 to your computer and use it in GitHub Desktop.

Using Flow database structure

Checking connection

\d+
List of relations
SchemaNameTypeOwnerSizeDescription
publiccollectiontablelumen8192 bytes
publiccollection_entitytablelumen8192 bytes
publicdashboardtablelumen8192 bytes
publicdashboard_visualisationtablelumen8192 bytes
publicdata_sourcetablelumen8192 bytes
publicdatasettablelumen8192 bytes
publicdataset_versiontablelumen8192 bytes
publicgeography_columnsviewpostgres0 bytes
publicgeometry_columnsviewpostgres0 bytes
publicinvitetablelumen8192 bytes
publicjob_executiontablelumen8192 bytes
publicragtime_migrationstablelumen8192 bytes
publicraster_columnsviewpostgres0 bytes
publicraster_datasettablelumen8192 bytes
publicraster_overviewsviewpostgres0 bytes
publicsharetablelumen8192 bytes
publicspatial_ref_systablepostgres4616 kB
publicvisualisationtablelumen8192 bytes

Replicating Flow DB structure

Sample JSON from Flow API

{
  "id": "562899168",
  "name": "My RQG Survey",
  "registrationFormId": "",
  "forms": [
    {
      "id": "562919156",
      "name": "Form 1",
      "questionGroups": [
        {
          "id": "572849152",
          "name": "General",
          "repeatable": true,
          "questions": [
            {
              "id": "562909151",
              "name": "Name",
              "type": "FREE_TEXT",
              "order": 1,
              "variableName": "name",
              "createdAt": "2020-05-04T05:24:00.933Z",
              "modifiedAt": "2020-05-04T05:24:17.613Z"
            },
            {
              "id": "562879159",
              "name": "Age",
              "type": "NUMBER",
              "order": 2,
              "variableName": "age",
              "createdAt": "2020-05-04T05:24:21.066Z",
              "modifiedAt": "2020-05-04T05:24:38.712Z"
            },
            {
              "id": "594369156",
              "name": "Literacy",
              "type": "OPTION",
              "order": 3,
              "variableName": "literacy",
              "createdAt": "2020-05-04T05:24:42.025Z",
              "modifiedAt": "2020-05-04T05:25:22.813Z"
            }
          ],
          "createdAt": "2020-05-04T05:23:37.232Z",
          "modifiedAt": "2020-05-04T05:23:59.219Z"
        }
      ],
      "createdAt": "2020-05-04T05:23:24.915Z",
      "modifiedAt": "2020-05-04T05:25:33.321Z",
      "formInstancesUrl": "https://api-auth0.akvotest.org/flow/orgs/uat1/form_instances?survey_id=562899168&form_id=562919156"
    }
  ],
  "createdAt": "2020-05-04T05:23:08.705Z",
  "modifiedAt": "2020-05-04T05:23:22.939Z",
  "dataPointsUrl": "https://api-auth0.akvotest.org/flow/orgs/uat1/data_points?survey_id=562899168"
}
jq -M '{id, name, registrationFormId, createdAt, modifiedAt}' survey.json

Survey

CREATE TABLE survey (
    id text PRIMARY KEY,
    name text NOT NULL,
    registration_form_id text,
    created_at timestamptz NOT NULL DEFAULT now(),
    modified_at timestamptz NOT NULL DEFAULT now()
);
CREATE TABLE
INSERT INTO survey(id, name, created_at, modified_at)
VALUES ('562899168', 'My RQG Survey', '2020-05-04T05:23:08.705Z', '2020-05-04T05:23:22.939Z')
INSERT 0 1
SELECT * FROM survey ORDER BY id
idnameregistration_form_idcreated_atmodified_at
562899168My RQG Survey2020-05-04 05:23:08.705+002020-05-04 05:23:22.939+00

Form

jq -M '.forms[] | keys' survey.json
jq -M '.forms[0] | {id, name, createdAt, modifiedAt}' survey.json
CREATE TABLE form (
    id text PRIMARY KEY,
    survey_id text NOT NULL REFERENCES survey(id),
    name text NOT NULL,
    created_at timestamptz NOT NULL DEFAULT now(),
    modified_at timestamptz NOT NULL DEFAULT now()
);
CREATE TABLE
\d form
Table “public.form”
ColumnTypeCollationNullableDefault
idtextnot null
survey_idtextnot null
nametextnot null
created_attimestamp with time zonenot nullnow()
modified_attimestamp with time zonenot nullnow()
Indexes:
“form_pkey” PRIMARY KEY, btree (id)
Foreign-key constraints:
“form_survey_id_fkey” FOREIGN KEY (survey_id) REFERENCES survey(id)
INSERT INTO form(id, survey_id, name, created_at, modified_at)
VALUES('562919156', '562899168', 'Form 1', '2020-05-04T05:23:24.915Z', '2020-05-04T05:25:33.321Z');
INSERT 0 1
SELECT * FROM form ORDER BY id
idsurvey_idnamecreated_atmodified_at
562919156562899168Form 12020-05-04 05:23:24.915+002020-05-04 05:25:33.321+00

Question group

jq -M '.forms[0].questionGroups[] | keys' survey.json
CREATE TABLE question_group (
    id text PRIMARY KEY,
    form_id text NOT NULL REFERENCES form(id),
    name text NOT NULL,
    repeatable boolean NOT NULL DEFAULT FALSE,
    created_at timestamptz NOT NULL DEFAULT now(),
    modified_at timestamptz NOT NULL DEFAULT now()
);
CREATE TABLE
\d question_group
Table “public.question_group”
ColumnTypeCollationNullableDefault
idtextnot null
form_idtextnot null
nametextnot null
repeatablebooleannot nullfalse
created_attimestamp with time zonenot nullnow()
modified_attimestamp with time zonenot nullnow()
Indexes:
“question_group_pkey” PRIMARY KEY, btree (id)
Foreign-key constraints:
“question_group_form_id_fkey” FOREIGN KEY (form_id) REFERENCES form(id)
jq -M '.forms[0].questionGroups[0] | {id, name, repeatable, createdAt, modifiedAt}' survey.json
INSERT INTO question_group(id, form_id, name, repeatable, created_at, modified_at)
VALUES('572849152', '562919156', 'General', true, '2020-05-04T05:23:37.232Z', '2020-05-04T05:23:59.219Z')
INSERT 0 1
SELECT * FROM question_group ORDER BY id
idform_idnamerepeatablecreated_atmodified_at
572849152562919156Generalt2020-05-04 05:23:37.232+002020-05-04 05:23:59.219+00

Questions

jq -M '.forms[0].questionGroups[0].questions[0] | keys' survey.json
jq -M '.forms[0].questionGroups[0].questions[] | .type' survey.json
-- There are many more
CREATE TYPE question_type AS ENUM('FREE_TEXT', 'NUMBER', 'OPTION');

CREATE TABLE question (
    id text PRIMARY KEY,
    question_group_id text NOT NULL REFERENCES question_group(id),
    name text NOT NULL,
    "order" smallint NOT NULL DEFAULT 0,
    type question_type NOT NULL,
    variable_name text,
    created_at timestamptz NOT NULL DEFAULT now(),
    modified_at timestamptz NOT NULL DEFAULT now()
);
CREATE TYPE
CREATE TABLE
\d question
Table “public.question”
ColumnTypeCollationNullableDefault
idtextnot null
question_group_idtextnot null
nametextnot null
ordersmallintnot null0
typequestion_typenot null
variable_nametext
created_attimestamp with time zonenot nullnow()
modified_attimestamp with time zonenot nullnow()
Indexes:
“question_pkey” PRIMARY KEY, btree (id)
Foreign-key constraints:
“question_question_group_id_fkey” FOREIGN KEY (question_group_id) REFERENCES question_group(id)
jq -M '.forms[0].questionGroups[0].questions[]' survey.json

INSERT INTO question (id, question_group_id, name, "order", type, variable_name, created_at, modified_at)
VALUES ('562909151', '572849152', 'Name', 1, 'FREE_TEXT', 'name', '2020-05-04T05:24:00.933Z', '2020-05-04T05:24:17.613Z'),
       ('562879159', '572849152', 'Age', 2, 'NUMBER', 'age', '2020-05-04T05:24:21.066Z', '2020-05-04T05:24:38.712Z'),
       ('594369156', '572849152', 'Literacy', 3, 'OPTION', 'literacy', '2020-05-04T05:24:42.025Z', '2020-05-04T05:25:22.813Z');
INSERT 0 3
SELECT * FROM question ORDER BY id
idquestion_group_idnameordertypevariable_namecreated_atmodified_at
562879159572849152Age2NUMBERage2020-05-04 05:24:21.066+002020-05-04 05:24:38.712+00
562909151572849152Name1FREE_TEXTname2020-05-04 05:24:00.933+002020-05-04 05:24:17.613+00
594369156572849152Literacy3OPTIONliteracy2020-05-04 05:24:42.025+002020-05-04 05:25:22.813+00

Collected data

Datapoints

{
  "dataPoints": [
    {
      "id": "572859149",
      "identifier": "k5x6-rus4-s1k3",
      "displayName": "Juan",
      "latitude": null,
      "longitude": null,
      "createdAt": "2020-05-04T05:31:23.668Z",
      "modifiedAt": "2020-05-04T05:31:28.470Z"
    }
  ],
  "nextPageUrl": "https://api-auth0.akvotest.org/flow/orgs/uat1/data_points?survey_id=562899168&cursor=CjESK2oPc35ha3ZvZmxvdy11YXQxchgLEg5TdXJ2ZXllZExvY2FsZRiNxpSRAgwYACAA"
}
jq -M '.dataPoints[] | keys' data-points.json
CREATE TABLE datapoint (
    id text PRIMARY KEY,
    display_name text NOT NULL, -- always true?
    identifier text NOT NULL UNIQUE,
    location geometry(Point,4326), -- Use a geometry column instead of separate lat/long values
    created_at timestamptz NOT NULL DEFAULT now(),
    modified_at timestamptz NOT NULL DEFAULT now()
);
CREATE TABLE
\d datapoint
Table “public.datapoint”
ColumnTypeCollationNullableDefault
idtextnot null
display_nametextnot null
identifiertextnot null
locationgeometry(Point,4326)
created_attimestamp with time zonenot nullnow()
modified_attimestamp with time zonenot nullnow()
Indexes:
“datapoint_pkey” PRIMARY KEY, btree (id)
“datapoint_identifier_key” UNIQUE CONSTRAINT, btree (identifier)
jq -M '.dataPoints[]' data-points.json
INSERT INTO datapoint(id, display_name, identifier, created_at, modified_at)
VALUES ('572859149', 'Juan', 'k5x6-rus4-s1k3', '2020-05-04T05:31:23.668Z', '2020-05-04T05:31:28.470Z')
INSERT 0 1
SELECT * FROM datapoint
iddisplay_nameidentifierlocationcreated_atmodified_at
572859149Juank5x6-rus4-s1k32020-05-04 05:31:23.668+002020-05-04 05:31:28.47+00

Form instances

{
  "formInstances": [
    {
      "deviceIdentifier": "s7",
      "dataPointId": "572859149",
      "submissionDate": "2020-05-04T05:31:19Z",
      "modifiedAt": "2020-05-04T05:31:28.582Z",
      "id": "599259147",
      "responses": {
        "572849152": [
          {
            "562879159": 10,
            "562909151": "Juan",
            "594369156": [
              {
                "text": "Yes",
                "code": "1"
              }
            ]
          },
          {
            "562879159": 50,
            "562909151": "Pedro",
            "594369156": [
              {
                "text": "No",
                "code": "0"
              }
            ]
          },
          {
            "562879159": 25,
            "562909151": "Pablo",
            "594369156": [
              {
                "text": "Yes",
                "code": "1"
              }
            ]
          },
          {
            "562879159": 33,
            "562909151": "Maria",
            "594369156": [
              {
                "text": "Yes",
                "code": "1"
              }
            ]
          }
        ]
      },
      "identifier": "k5x6-rus4-s1k3",
      "displayName": "Juan",
      "formId": "562919156",
      "surveyalTime": 50,
      "submitter": "iperdomo",
      "createdAt": "2020-05-04T05:31:23.996Z"
    }
  ],
  "nextPageUrl": "https://api-auth0.akvotest.org/flow/orgs/uat1/form_instances?survey_id=562899168&form_id=562919156&cursor=CjESK2oPc35ha3ZvZmxvdy11YXQxchgLEg5TdXJ2ZXlJbnN0YW5jZRiL8N-dAgwYACAA"
}
jq -M '.formInstances[] | keys' form-instances.json

CREATE TABLE form_instance (
    id text PRIMARY KEY,
    form_id text NOT NULL REFERENCES form(id),
    datapoint_id text NOT NULL REFERENCES datapoint(id),
    device_identifier text NOT NULL,
    submitter text NOT NULL,
    submission_date timestamptz NOT NULL,
    surveyal_time numeric NOT NULL DEFAULT 0,
    created_at timestamptz NOT NULL DEFAULT now(),
    modified_at timestamptz NOT NULL DEFAULT now()
);
CREATE TABLE
jq -M '.formInstances[0] | del(.responses)' form-instances.json

INSERT INTO form_instance(id, form_id, datapoint_id, device_identifier, submission_date, submitter,
                          surveyal_time, created_at, modified_at)
VALUES('599259147', '562919156', '572859149', 's7', '2020-05-04T05:31:19Z', 'iperdomo', 50, '2020-05-04T05:31:23.996Z', '2020-05-04T05:31:28.582Z')
INSERT 0 1
SELECT * FROM form_instance
idform_iddatapoint_iddevice_identifiersubmittersubmission_datesurveyal_timecreated_atmodified_at
599259147562919156572859149s7iperdomo2020-05-04 05:31:19+00502020-05-04 05:31:23.996+002020-05-04 05:31:28.582+00

Answers

jq -M '.formInstances[].responses' form-instances.json

CREATE TABLE answer (
    id text PRIMARY KEY DEFAULT gen_random_uuid(),
    form_instance_id text NOT NULL REFERENCES form_instance(id),
    question_id text NOT NULL REFERENCES question(id),
    iteration smallint NOT NULL DEFAULT 0,
    value_text text,
    value_number numeric,
    value_date timestamptz,
    value_object jsonb,
    created_at timestamptz NOT NULL DEFAULT now(),
    modified_at timestamptz NOT NULL DEFAULT now(),
    UNIQUE(form_instance_id, question_id, iteration)
);
CREATE TABLE
\d answer
Table “public.answer”
ColumnTypeCollationNullableDefault
idtextnot nullgen_random_uuid()
form_instance_idtextnot null
question_idtextnot null
iterationsmallintnot null0
value_texttext
value_numbernumeric
value_datetimestamp with time zone
value_objectjsonb
created_attimestamp with time zonenot nullnow()
modified_attimestamp with time zonenot nullnow()
Indexes:
“answer_pkey” PRIMARY KEY, btree (id)
“answer_form_instance_id_question_id_iteration_key” UNIQUE CONSTRAINT, btree (form_instance_id, question_id, iteration)
Foreign-key constraints:
“answer_form_instance_id_fkey” FOREIGN KEY (form_instance_id) REFERENCES form_instance(id)
“answer_question_id_fkey” FOREIGN KEY (question_id) REFERENCES question(id)
jq -M '.formInstances[].responses["572849152"][]["562879159"]' form-instances.json
INSERT INTO answer(form_instance_id, question_id, iteration, value_number)
VALUES ('599259147', '562879159', 1, 10),
       ('599259147', '562879159', 2, 50),
       ('599259147', '562879159', 3, 25),
       ('599259147', '562879159', 4, 33)
INSERT 0 4
SELECT * FROM answer
idform_instance_idquestion_iditerationvalue_textvalue_numbervalue_datevalue_objectcreated_atmodified_at
b6e365fb-67da-4d58-8780-0a966f8840eb5992591475628791591102020-05-06 12:17:32.026693+002020-05-06 12:17:32.026693+00
72d0d3fc-bd9e-4afd-b803-d4603ac084765992591475628791592502020-05-06 12:17:32.026693+002020-05-06 12:17:32.026693+00
b271b72c-95f5-4222-a572-844ec65e04f15992591475628791593252020-05-06 12:17:32.026693+002020-05-06 12:17:32.026693+00
424d4b3e-a93c-4391-8bde-c0bfaa8e00b05992591475628791594332020-05-06 12:17:32.026693+002020-05-06 12:17:32.026693+00
jq -M '.formInstances[].responses["572849152"][]["562909151"]' form-instances.json
INSERT INTO answer(form_instance_id, question_id, iteration, value_text)
VALUES ('599259147', '562909151', 1, 'Juan'),
       ('599259147', '562909151', 2, 'Pedro'),
       ('599259147', '562909151', 3, 'Pablo'),
       ('599259147', '562909151', 4, 'Maria')
INSERT 0 4
SELECT * FROM answer ORDER by form_instance_id, iteration, question_id;
idform_instance_idquestion_iditerationvalue_textvalue_numbervalue_datevalue_objectcreated_atmodified_at
b6e365fb-67da-4d58-8780-0a966f8840eb5992591475628791591102020-05-06 12:17:32.026693+002020-05-06 12:17:32.026693+00
a29511f4-5537-43a6-9257-dfe609d721875992591475629091511Juan2020-05-06 12:20:29.322498+002020-05-06 12:20:29.322498+00
72d0d3fc-bd9e-4afd-b803-d4603ac084765992591475628791592502020-05-06 12:17:32.026693+002020-05-06 12:17:32.026693+00
0d623612-af71-4878-90c9-88dc51bf19905992591475629091512Pedro2020-05-06 12:20:29.322498+002020-05-06 12:20:29.322498+00
b271b72c-95f5-4222-a572-844ec65e04f15992591475628791593252020-05-06 12:17:32.026693+002020-05-06 12:17:32.026693+00
a6e7e75b-239d-4e7e-bc38-9c491bd2f9095992591475629091513Pablo2020-05-06 12:20:29.322498+002020-05-06 12:20:29.322498+00
424d4b3e-a93c-4391-8bde-c0bfaa8e00b05992591475628791594332020-05-06 12:17:32.026693+002020-05-06 12:17:32.026693+00
e90ad3c7-eb36-4dd8-bf14-73a63594b7b15992591475629091514Maria2020-05-06 12:20:29.322498+002020-05-06 12:20:29.322498+00
jq -c -M '.formInstances[].responses["572849152"][]["594369156"]' form-instances.json
INSERT INTO answer(form_instance_id, question_id, iteration, value_object)
VALUES ('599259147', '594369156', 1, '[{"text":"Yes","code":"1"}]'::jsonb),
       ('599259147', '594369156', 2, '[{"text":"No","code":"0"}]'::jsonb),
       ('599259147', '594369156', 3, '[{"text":"Yes","code":"1"}]'::jsonb),
       ('599259147', '594369156', 4, '[{"text":"Yes","code":"1"}]'::jsonb)
INSERT 0 4
SELECT * FROM answer ORDER by form_instance_id, iteration, question_id;
idform_instance_idquestion_iditerationvalue_textvalue_numbervalue_datevalue_objectcreated_atmodified_at
b6e365fb-67da-4d58-8780-0a966f8840eb5992591475628791591102020-05-06 12:17:32.026693+002020-05-06 12:17:32.026693+00
a29511f4-5537-43a6-9257-dfe609d721875992591475629091511Juan2020-05-06 12:20:29.322498+002020-05-06 12:20:29.322498+00
ec46a7c4-f358-401a-86bb-fd7d88df0cdd5992591475943691561[{“code”: “1”, “text”: “Yes”}]2020-05-06 12:29:00.705249+002020-05-06 12:29:00.705249+00
72d0d3fc-bd9e-4afd-b803-d4603ac084765992591475628791592502020-05-06 12:17:32.026693+002020-05-06 12:17:32.026693+00
0d623612-af71-4878-90c9-88dc51bf19905992591475629091512Pedro2020-05-06 12:20:29.322498+002020-05-06 12:20:29.322498+00
6f234327-2df7-4e29-b049-7356b6ba1ed85992591475943691562[{“code”: “0”, “text”: “No”}]2020-05-06 12:29:00.705249+002020-05-06 12:29:00.705249+00
b271b72c-95f5-4222-a572-844ec65e04f15992591475628791593252020-05-06 12:17:32.026693+002020-05-06 12:17:32.026693+00
a6e7e75b-239d-4e7e-bc38-9c491bd2f9095992591475629091513Pablo2020-05-06 12:20:29.322498+002020-05-06 12:20:29.322498+00
2fe0f858-5a3b-4149-aaae-79e76407b44c5992591475943691563[{“code”: “1”, “text”: “Yes”}]2020-05-06 12:29:00.705249+002020-05-06 12:29:00.705249+00
424d4b3e-a93c-4391-8bde-c0bfaa8e00b05992591475628791594332020-05-06 12:17:32.026693+002020-05-06 12:17:32.026693+00
e90ad3c7-eb36-4dd8-bf14-73a63594b7b15992591475629091514Maria2020-05-06 12:20:29.322498+002020-05-06 12:20:29.322498+00
55d750b9-b90e-4be2-9d8d-7ed14654def65992591475943691564[{“code”: “1”, “text”: “Yes”}]2020-05-06 12:29:00.705249+002020-05-06 12:29:00.705249+00

Presenting the data as a Dataset

Main dataset

CREATE OR REPLACE VIEW form_metadata_562919156 AS
SELECT dp.identifier, dp.display_name, fi.device_identifier, fi.id as instance, fi.submitter,
       fi.surveyal_time, 1 as form_version, fi.submission_date -- we don't have form version yet
 FROM datapoint dp,
      form_instance fi
WHERE dp.id = fi.datapoint_id;
CREATE VIEW
SELECT * FROM form_metadata_562919156
identifierdisplay_namedevice_identifierinstancesubmittersurveyal_timeform_versionsubmission_date
k5x6-rus4-s1k3Juans7599259147iperdomo5012020-05-04 05:31:19+00

Repeated group

CREATE OR REPLACE VIEW question_group_572849152 AS
SELECT q1.value_text as "562909151_name", q2.value_number as "562879159_age",
       concat(q3.value_object->0->>'code', ':', q3.value_object->0->>'text')  as "594369156_literacy",
       q1.form_instance_id
  FROM answer q1, answer q2, answer q3
 WHERE q1.question_id = '562909151'
   AND q2.question_id = '562879159'
   AND q3.question_id = '594369156'
   AND q1.form_instance_id = q2.form_instance_id
   AND q2.form_instance_id = q3.form_instance_id
   AND q1.iteration = q2.iteration
   AND q2.iteration = q3.iteration
ORDER BY q1.iteration
CREATE VIEW
SELECT * FROM question_group_572849152
562909151_name562879159_age594369156_literacyform_instance_id
Juan101:Yes599259147
Pedro500:No599259147
Pablo251:Yes599259147
Maria331:Yes599259147
SELECT *
  FROM form_metadata_562919156 f,
       question_group_572849152 qg
 WHERE f.instance = qg.form_instance_id
identifierdisplay_namedevice_identifierinstancesubmittersurveyal_timeform_versionsubmission_date562909151_name562879159_age594369156_literacyform_instance_id
k5x6-rus4-s1k3Juans7599259147iperdomo5012020-05-04 05:31:19+00Juan101:Yes599259147
k5x6-rus4-s1k3Juans7599259147iperdomo5012020-05-04 05:31:19+00Pedro500:No599259147
k5x6-rus4-s1k3Juans7599259147iperdomo5012020-05-04 05:31:19+00Pablo251:Yes599259147
k5x6-rus4-s1k3Juans7599259147iperdomo5012020-05-04 05:31:19+00Maria331:Yes599259147

Flow DB structure

We have entities that defines schema and tables that hold data. Similar to OO concepts, classes and instances.

Entities that define schema

  • Survey
  • Form
  • Question Group
  • Question

Entities that hold data

  • Datapoint
  • Form Instance
  • Answer

We can clearly see that we have less entities that hold data than those defining schema. The reason for this is that some entities that defined only presentation/UI concepts (Question Group) were reused to define schema.

StructureData
SurveyDatapoint
FormForm Instance
Question Group
QuestionAnswer

In the current DB structure, there is a missing Question Group Instance, so an Answer should point to that parent and not to Form Instance. More over, a Datapoint does not have a link (foreign key) to a Survey.

https://user-images.githubusercontent.com/178474/81249748-af709b80-901f-11ea-987c-0e61b4949850.png

Implementation notes/thoughts

  • There are several ways to optimize how we store answers. Having those all rows in a single table has some problems:
    • A single table means joining the same table several times (as many as questions). We can always optimize by using subqueries and limiting the results per question_id but is always expensive.
      • As example akvoflow-23 would have 31M rows in the answer table.
    • To avoid NULL values and solve the problem of millions of rows in a single table we can shard. Create a table per question id and type (e.g. answer_123_text, answer_123_date, etc)
  • Those views with the crosstab/pivot (e.g. form_123_metadata, question_group_456) are actual queries/joins to the database. We could pre-compute the work using Materialized Views. Using materialized views has the drawback of duplicating data (at least double of data storage requirement) and they need to be refreshed. https://www.postgresql.org/docs/11/rules-materializedviews.html
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment