Skip to content

Instantly share code, notes, and snippets.

@khun84
Last active July 29, 2019 07:42
Show Gist options
  • Save khun84/48f7b9ece947952f34e6f805d49b9024 to your computer and use it in GitHub Desktop.
Save khun84/48f7b9ece947952f34e6f805d49b9024 to your computer and use it in GitHub Desktop.
Archive Service Match

ServiceMatch table housekeeping

Steps:

  1. Provision snapshot instance with same production spec
  1. Create archived service match table in another namespace
  2. Export service matches that are to be archived (via pg_dump or copy) into flat file (to be performed using psql in one of the prod application instance)
  3. Load archived service matches from flat file into archive table (via pg_restore or copy)
  4. Hard delete service matches to be archived in production DB by using batch strategy

Create backup table

Indices are not created on the backup table to prevent poor insertion performance.

create schema backup;

create table backup.<country>_deleted_service_matches
(
	id integer not null primary key,
	service_provider_id integer,
	service_request_id integer,
	state integer not null,
	created_at timestamp not null,
	updated_at timestamp not null,
	read_by_user boolean not null,
	booked_at timestamp,
	quoted_at timestamp,
	deleted_at timestamp,
	read_by_user_at timestamp,
	read_by_pro boolean,
	read_by_pro_at timestamp,
	reviewable_at timestamp,
	credit_cost integer,
	refund boolean,
	refund_type integer not null,
	refund_reason varchar,
	current_session_count integer,
	completed_session_count integer,
	scheduled_at timestamp,
	final_price double precision,
	match_payment_state integer,
	franchise boolean,
	original_final_price double precision,
	refunded_at timestamp,
	refund_requested boolean,
	archived boolean,
	refund_declined_at timestamp,
	abort_reason integer,
	accepted_scheduled_at timestamp,
	cancelled_at timestamp,
	freeze_status integer
);

Benchmark

Benchmarking export, import and deletion on MY table.

Benchmarking with machine spec:

Production DB Spec
db.m5.xlarge
vCPU 4
RAM 16 GB
Network Performance: Up to 10 Gbps
EBS Bandwidth: Up to 3,500 Mbps
Disk:
  Type: GP2 (SSD)
  Size: 500 GB
  Maximum Throughput: 250 MiB/s
  Baseline performance: 1,500 IOPS
  Maximum burst performance of 3,000 IOPS for 60 minutes

Export using COPY

  1. 2015: less than 5 min
  2. 2016-2019: less than 2 min for each year

Script

-- in psql
\copy (select * from country.service_matches sm where sm.deleted_at is not null and (sm.created_at) >= (<start_date>::timestamp - interval '8' hour) and (sm.created_at) < (<end_date>::timestamp - interval '8' hour)) to '/home/ubuntu/tmp/<country>_service_matches_<year>.csv' delimiter ',' csv header;

Import using COPY

  1. 2015: less than 1 min
  2. 2016-2019: less than 2 min for each year

Script

-- in psql
\copy backup.my_deleted_service_matches from '/home/ubuntu/tmp/<country>_service_matches_<year>.csv' delimiter ',' csv header ;

Deletion benchmark

Script

set search_path  = <country>;
delete from service_matches sm
     where sm.deleted_at is not null
       and (sm.created_at) >= (<start_date>::timestamp - interval '8' hour)
       and (sm.created_at) < (<end_date>::timestamp - interval '8' hour)
;

Logs

sql> delete from my.service_matches sm
     where sm.deleted_at is not null
       and (sm.created_at) >= ('2014-01-01'::timestamp - interval '8' hour)
       and (sm.created_at) < ('2016-01-01'::timestamp - interval '8' hour)
[2019-07-25 15:14:32] 39070 rows affected in 11 s 624 ms
sql> delete from my.service_matches sm
     where sm.deleted_at is not null
       and (sm.created_at) >= ('2016-01-01'::timestamp - interval '8' hour)
       and (sm.created_at) < ('2017-01-01'::timestamp - interval '8' hour)
[2019-07-25 15:15:02] 13547 rows affected in 8 s 774 ms
sql> delete from my.service_matches sm
     where sm.deleted_at is not null
       and (sm.created_at) >= ('2017-01-01'::timestamp - interval '8' hour)
       and (sm.created_at) < ('2018-01-01'::timestamp - interval '8' hour)
[2019-07-25 15:15:41] 9798154 rows affected in 19 s 971 ms
sql> delete from my.service_matches sm
     where sm.deleted_at is not null
       and (sm.created_at) >= ('2018-01-01'::timestamp - interval '8' hour)
       and (sm.created_at) < ('2019-01-01'::timestamp - interval '8' hour)
[2019-07-25 15:16:27] 16911451 rows affected in 33 s 279 ms

Post backup

Vacuum and analyze service match table.

-- in psql
set search_path = <country>;
vacuum (analyse, verbose ) service_matches;

-- radical measure, this will reclaim more space with the cost of table lock
vacuum (full, analyse, verbose ) service_matches;

-- even more radical, try REINDEX
reindex index service_matches.index_smatches_deleted_at_sr_id_state_nodelete;
reindex index service_matches.index_smatches_deleted_at_sp_id_state_nodelete;

-- or DROP and INDEX again
drop index service_matches.index_smatches_deleted_at_sr_id_state_nodelete;
create index concurrently index_smatches_deleted_at_sr_id_state_nodelete
	on service_matches (deleted_at, service_request_id, state)
	where (deleted_at IS NULL);

drop index service_matches.index_smatches_deleted_at_sp_id_state_nodelete;
create index index_smatches_deleted_at_sp_id_state_nodelete
	on service_matches (deleted_at, service_provider_id, state)
	where (deleted_at IS NULL);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment