Steps:
- Provision snapshot instance with same production spec
- Create archived service match table in another namespace
- Export service matches that are to be archived (via pg_dump or copy) into flat file (to be performed using psql in one of the prod application instance)
- Load archived service matches from flat file into archive table (via pg_restore or copy)
- Hard delete service matches to be archived in production DB by using batch strategy
Indices are not created on the backup table to prevent poor insertion performance.
create schema backup;
create table backup.<country>_deleted_service_matches
(
id integer not null primary key,
service_provider_id integer,
service_request_id integer,
state integer not null,
created_at timestamp not null,
updated_at timestamp not null,
read_by_user boolean not null,
booked_at timestamp,
quoted_at timestamp,
deleted_at timestamp,
read_by_user_at timestamp,
read_by_pro boolean,
read_by_pro_at timestamp,
reviewable_at timestamp,
credit_cost integer,
refund boolean,
refund_type integer not null,
refund_reason varchar,
current_session_count integer,
completed_session_count integer,
scheduled_at timestamp,
final_price double precision,
match_payment_state integer,
franchise boolean,
original_final_price double precision,
refunded_at timestamp,
refund_requested boolean,
archived boolean,
refund_declined_at timestamp,
abort_reason integer,
accepted_scheduled_at timestamp,
cancelled_at timestamp,
freeze_status integer
);
Benchmarking export, import and deletion on MY table.
Production DB Spec
db.m5.xlarge
vCPU 4
RAM 16 GB
Network Performance: Up to 10 Gbps
EBS Bandwidth: Up to 3,500 Mbps
Disk:
Type: GP2 (SSD)
Size: 500 GB
Maximum Throughput: 250 MiB/s
Baseline performance: 1,500 IOPS
Maximum burst performance of 3,000 IOPS for 60 minutes
- 2015: less than 5 min
- 2016-2019: less than 2 min for each year
Script
-- in psql
\copy (select * from country.service_matches sm where sm.deleted_at is not null and (sm.created_at) >= (<start_date>::timestamp - interval '8' hour) and (sm.created_at) < (<end_date>::timestamp - interval '8' hour)) to '/home/ubuntu/tmp/<country>_service_matches_<year>.csv' delimiter ',' csv header;
- 2015: less than 1 min
- 2016-2019: less than 2 min for each year
Script
-- in psql
\copy backup.my_deleted_service_matches from '/home/ubuntu/tmp/<country>_service_matches_<year>.csv' delimiter ',' csv header ;
Script
set search_path = <country>;
delete from service_matches sm
where sm.deleted_at is not null
and (sm.created_at) >= (<start_date>::timestamp - interval '8' hour)
and (sm.created_at) < (<end_date>::timestamp - interval '8' hour)
;
Logs
sql> delete from my.service_matches sm
where sm.deleted_at is not null
and (sm.created_at) >= ('2014-01-01'::timestamp - interval '8' hour)
and (sm.created_at) < ('2016-01-01'::timestamp - interval '8' hour)
[2019-07-25 15:14:32] 39070 rows affected in 11 s 624 ms
sql> delete from my.service_matches sm
where sm.deleted_at is not null
and (sm.created_at) >= ('2016-01-01'::timestamp - interval '8' hour)
and (sm.created_at) < ('2017-01-01'::timestamp - interval '8' hour)
[2019-07-25 15:15:02] 13547 rows affected in 8 s 774 ms
sql> delete from my.service_matches sm
where sm.deleted_at is not null
and (sm.created_at) >= ('2017-01-01'::timestamp - interval '8' hour)
and (sm.created_at) < ('2018-01-01'::timestamp - interval '8' hour)
[2019-07-25 15:15:41] 9798154 rows affected in 19 s 971 ms
sql> delete from my.service_matches sm
where sm.deleted_at is not null
and (sm.created_at) >= ('2018-01-01'::timestamp - interval '8' hour)
and (sm.created_at) < ('2019-01-01'::timestamp - interval '8' hour)
[2019-07-25 15:16:27] 16911451 rows affected in 33 s 279 ms
Vacuum and analyze service match table.
-- in psql
set search_path = <country>;
vacuum (analyse, verbose ) service_matches;
-- radical measure, this will reclaim more space with the cost of table lock
vacuum (full, analyse, verbose ) service_matches;
-- even more radical, try REINDEX
reindex index service_matches.index_smatches_deleted_at_sr_id_state_nodelete;
reindex index service_matches.index_smatches_deleted_at_sp_id_state_nodelete;
-- or DROP and INDEX again
drop index service_matches.index_smatches_deleted_at_sr_id_state_nodelete;
create index concurrently index_smatches_deleted_at_sr_id_state_nodelete
on service_matches (deleted_at, service_request_id, state)
where (deleted_at IS NULL);
drop index service_matches.index_smatches_deleted_at_sp_id_state_nodelete;
create index index_smatches_deleted_at_sp_id_state_nodelete
on service_matches (deleted_at, service_provider_id, state)
where (deleted_at IS NULL);