Skip to content

Instantly share code, notes, and snippets.

@Aitem
Created February 4, 2021 08:10
Show Gist options
  • Save Aitem/44135fd19ae1c76fd5b260c265ee0196 to your computer and use it in GitHub Desktop.
Save Aitem/44135fd19ae1c76fd5b260c265ee0196 to your computer and use it in GitHub Desktop.
Data Denormalization meetup

Data Denormalization

Data DeNormalization

What is Data Normalization?

What about data normal forms?

  • Avoid data duplication to reduce data redundancy
    • Less data
  • Avoid multiple updates - one source of truth
    • Avoid data editing anomaly
  • A more representative data model

Sample

EmployeePhoneFloorRoom №
Alice651435120 b
Bob4324234330 a
John534315330 a

3NF

———-+———+———+ ——————-+——-
EmployeePhoneRoom IDRoom IDRoom №Floor
———-+———+———+ ——————-+——-
Alice6514351120 b1
Bob43242342230 a3
John5343152——————-+——-

Why denormalization?

Project on begin

EmployeeRoom
id->id
namefloor
phoneroom№
room_id

Project on prod

SalaryRoom
idid<—-+
employee—-+floor
dateroom№
amount
DeparmentEmployee
->id->id
namename
head-+phone
room_id-+

+———————| department |

Project after 1 year

ProjectCustomer
———–————
╔═>id╔═>id
customer═════╝name
namephone
logo
Member
╚══project
role
employee══════╗
SalaryRoom
idid<══╗
employee════╣floor
dateroom№
amount║ ║
║ ║
DeparmentEmployee
╔═>id╠═>id
namename
head═╝phone
room_id═╝
╚═════════════════════department

New feature

——————————————–

Search: __________________
John +1(991)33434 - Development Room 42
Year Salary: 100$ Lead: Max
Projects: QA in Cars, Dev in Toys
Ivan +1(991)4325355 - Management Room 36
Year Salary: 80$ Lead: Ken
Projects: QA in Cars, Manager in Food

——————————————–

SQL

-- pseudo sql =)
select
  employee
  , sum(salary)
  , array_agg(member)
  ...
from employee
join department on ..
join employee   on ..
join salary     on ..
join member     on ..
join project    on ..
group by employee
where
  employee      ilike $search
  or member     ilike $search
  or project    ilike $search
  or department ilike $search
  ...

Alter Employee

Employee
id
name
phone
room_id
department
=salary
=boss
=member[]
=fts

Add subs/hooks

Add subs/hooks for all related tables

  • Salary
  • Deparment
  • Employee
  • Project
  • Member

Goals of denormalization

  • Improve search
    • Avoid joins and aggregations
    • Avoid multiple or
  • Simplify complex data relations and structure

How to denormalize?

Edit existing entity

Employee
id
name
phone
room_id
department
=salary
=boss
=member[]
=fts

Real case

Normal form data:

————– —————— ———————— —————–

Practitioner<-+-PractitionerRole->Organization->Organization
Иванов И.И.Врач стоматологСтоматологическое отд.Гор. стом. пол.

————– | —————— ———————— —————–

—————— ———————— —————–
-PractitionerRole->Organization->Organization
ТерапевтТерапевт. отделениеГКБ 1

—————— ———————— —————–

After denormalization:

————————-

Practitioner
Иванов И.И.
=Врач стоматолог
=Стоматологическое отд.
=Гор. стом. пол.
=Терапевт
=Терапевт. отделение
=ГКБ 1

————————-

Profit

  • Super easy

Problems

  • Standarts conflict
  • Integration
  • Conflict existing ui forms/controllers
  • Business Logic
  • Heavy write load
MVCC

———-

8kb page
1 —–
2 —–
3 —–

———-

Update row 1 and 2

———-

8kb page
-1 ——
-2 ——
3 ——
1*——

———-

8kb page
2*——

———-

TOAST - The Oversized-Attribute Storage Technique

———-

8kb page
1 —–
2 —–
3 —–

———-

Update row 1 insert too large object

———-

8kb page
-1 —–
2 —–
1*——-—————-

———-

———- ————–

8kb pageToast
———-————–
-1 —–1.1* ——-
2 —–1.2* ——-
1* toast

———- ————–

Create new entity

EmployeeInfo
employee_id
=phone
=salary
=boss
=member[]
=fts

Real case

———– ————— ——— ——— ——– ————– —————–

Condition<-EpisodeOfCare->Patient<-Binding->Sector->Practitioner->PractitonerRole

———– ————— ——— ——— ——– ————– —————–

Separate administrative and clinical parts

Administrative part

——— ——— ——– ————– —————–

Patient<-Binding->Sector->Practitioner->PractitonerRole

——— ——— ——– ————– —————–

PatientBinding
patient_id
sector_id
practitionerrole_id
Clinical part

——— ————— ———–

Patient<-EpisodeOfCare->Condition

——— ————— ———–

PatientCondition
patient_id
condition_type
Profits
  • Less and easiera update logic
  • No update logic
  • Explicit denormalization
  • Less toast risk
Bloat protection =)
-- pseudo sql
drop table if exists denormalization.patientbinding;
CREATE TABLE if not exists denormalization.patientbinding (like public.patientbinding);

insert into denormalization.patientbinding (patient_id, sector_id, practitionerrole)
select patient.id ....
from patient
join binding on ..
join sector on ..
join practitioner on ..
join practitionerrole on ..;

create index patientbinding_patient_id on denormalization.patientbinding (patient_id);
create index patientbinding_sector_id on denormalization.patientbinding (sector_id);

BEGIN;
-- Drop target table
drop table public.patientbinding;

-- Replace target table on public schema wiht temp table
alter table denormalization.patientbinding set schema public;

COMMIT;

Conclusion

  • Use denormalization careful
  • Database maintenance
  • Try to decomposit your complex task

References

PG bloating - https://www.youtube.com/watch?v=-GNHIHEHDmQ&t=2361s&ab_channel=HighLoadChannel PG autovacuum -https://www.youtube.com/watch?v=TDWC66qzxCs&ab_channel=HighLoadChannel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment