Skip to content

Instantly share code, notes, and snippets.

@dmagda
Last active July 1, 2022 17:49
Show Gist options
  • Save dmagda/397a716bda199171d8157e601b9670d6 to your computer and use it in GitHub Desktop.
Save dmagda/397a716bda199171d8157e601b9670d6 to your computer and use it in GitHub Desktop.
Demo scripts for the Table Partitioning Tech Talk

Table Partitioning in PostgreSQL and YugabyteDB

Follow instructions below to experiment with partitioning in YugabyteDB.

Set up Local Environment

Start PostgreSQL

  1. Start a Postgres instance:
docker run --name postgresql -e POSTGRES_USER=postgres -e POSTGRES_PASSWORD=postgres -p 5432:5432 -v ~/postgresql_data/:/var/lib/postgresql/data -d postgres
  1. Connect with psql:
psql -h 127.0.0.1 --username=postgres

Start YugabyteDB

  1. Start a single-node cluster:
mkdir ~/yb_docker_data

docker network create yugabytedb_network

docker run -d --name yugabytedb_node1 --net yugabytedb_network \
  -p 7001:7000 -p 9000:9000 -p 5433:5433 \
  -v ~/yb_docker_data/node1:/home/yugabyte/yb_data --restart unless-stopped \
  yugabytedb/yugabyte:latest \
  bin/yugabyted start --listen=yugabytedb_node1 \
  --base_dir=/home/yugabyte/yb_data --daemon=false
  
docker run -d --name yugabytedb_node2 --net yugabytedb_network \
  -v ~/yb_docker_data/node2:/home/yugabyte/yb_data --restart unless-stopped \
  yugabytedb/yugabyte:latest \
  bin/yugabyted start --join=yugabytedb_node1 --listen=yugabytedb_node2 \
  --base_dir=/home/yugabyte/yb_data --daemon=false
      
docker run -d --name yugabytedb_node3 --net yugabytedb_network \
  -v ~/yb_docker_data/node3:/home/yugabyte/yb_data --restart unless-stopped \
  yugabytedb/yugabyte:latest \
  bin/yugabyted start --join=yugabytedb_node1 --listen=yugabytedb_node3 \
  --base_dir=/home/yugabyte/yb_data --daemon=false
  1. Confirm the container runs normally: http://127.0.0.1:7001

  2. Connect with psql:

psql -h 127.0.0.1 -p 5433 yugabyte -U yugabyte -w

Create Sample Schema

  1. Create the schema:
CREATE TYPE status_t AS ENUM('ordered', 'baking', 'delivering', 'yummy-in-my-tummy');

DROP TABLE IF EXISTS PizzaOrders; 
CREATE TABLE PizzaOrders
 (
   order_id   int PRIMARY KEY,
   order_status   status_t,
   order_time   timestamp
 );
  1. Create the pizza_orders_data.sql with the following content in your home directory:
INSERT INTO PizzaOrders VALUES 
(1, 'yummy-in-my-tummy', '2021-12-27 22:00:00'),
(2, 'yummy-in-my-tummy', '2022-05-15 13:00:00'),
(3, 'yummy-in-my-tummy', '2022-05-23 10:00:00'),
(4, 'yummy-in-my-tummy', '2022-06-23 19:00:00'),
(5, 'delivering', '2022-06-24 8:30:00'),
(6, 'baking', '2022-06-24 8:45:00'),
(7, 'baking', '2022-06-24 9:00:00'),
(8, 'ordered', '2022-06-24 10:00:00');  
  1. Load the data:
\i ~/pizza_orders_data.sql;
  1. Check the table structure:
\d+ PizzaOrders;

SELECT tableoid::regclass,* from PizzaOrders 
  ORDER BY order_id;

Partition By Hash

A hash partition is created by using modulus and remainder for each partition:

DROP TABLE PizzaOrders; 
CREATE TABLE PizzaOrders
 (
   order_id   int,
   order_status   status_t,
   order_time   timestamp
 ) PARTITION BY HASH (order_id);

CREATE TABLE orders_part1 PARTITION OF PizzaOrders 
  FOR VALUES WITH (modulus 2, remainder 0);

CREATE TABLE orders_part2 PARTITION OF PizzaOrders 
  FOR VALUES WITH (modulus 2, remainder 1);

\i ~/pizza_orders_data.sql;

Check the table structure and records distribution across partitioned tables:

\d+ PizzaOrders;

SELECT tableoid::regclass,* from PizzaOrders 
  ORDER BY order_id;

Partition By List

Partition data by the order status.

Create List Partitions

  1. Reload the schema and create partitions:
DROP TABLE PizzaOrders; 
CREATE TABLE PizzaOrders
 (
   order_id   int,
   order_status   status_t,
   order_time   timestamp,
   PRIMARY KEY (order_id, order_status)
 ) PARTITION BY LIST (order_status);
 
CREATE TABLE orders_in_progress PARTITION OF PizzaOrders 
  FOR VALUES IN('ordered','baking');
  
CREATE TABLE orders_in_delivery PARTITION OF PizzaOrders 
  FOR VALUES IN('delivering');
  
CREATE TABLE orders_completed PARTITION OF PizzaOrders 
  FOR VALUES IN('yummy-in-my-tummy');

\i ~/pizza_orders_data.sql;
  1. Check the table structure and records distribution across partitioned tables:
\d+ PizzaOrders;

SELECT tableoid::regclass,* from PizzaOrders 
  ORDER BY order_id;

Test Partition Changes

Records are moved between partitions automatically if the value of a partitioning column is changed.

  1. Check the status and partition for the record with order_id=6:
SELECT tableoid::regclass,* from PizzaOrders 
  WHERE order_id=6 ORDER BY order_id;
  1. Change the status to delivering:
UPDATE PizzaOrders SET order_status='delivering' WHERE order_id=6;
  1. Confirm the record was moved to a different partition:
SELECT tableoid::regclass,* from PizzaOrders 
  WHERE order_id=6 ORDER BY order_id;

Test Partition Pruning

  1. Run this query and confirm that the scan happened only on orders_completed partition:
EXPLAIN ANALYZE SELECT * FROM PizzaOrders 
  WHERE order_status = 'yummy-in-my-tummy';
  1. Turn the partition pruning off and observe how the entire data set is scanned:
SET enable_partition_pruning = off;
 
EXPLAIN ANALYZE SELECT * FROM PizzaOrders 
  WHERE order_status = 'yummy-in-my-tummy';
  1. Enable the pruning back:
SET enable_partition_pruning = on;
  1. Apply functions and compare results:
EXPLAIN ANALYZE SELECT * FROM PizzaOrders 
  WHERE order_status::text = lower('yummy-in-my-tummy');

Partition By Range

Partition records by range using order_time as a partition column.

Create Range Partitions

  1. Reload the schema and create partitions:
DROP TABLE PizzaOrders; 
CREATE TABLE PizzaOrders
 (
   order_id   int,
   order_status   status_t,
   order_time   timestamp,
   PRIMARY KEY (order_id, order_time)
 ) PARTITION BY RANGE (order_time);
 
CREATE TABLE orders_2022_06 PARTITION OF PizzaOrders 
  FOR VALUES FROM('2022-06-01') TO ('2022-07-01');
  
CREATE TABLE orders_2022_05 PARTITION OF PizzaOrders 
  FOR VALUES FROM('2022-05-01') TO ('2022-06-01');
  
CREATE TABLE orders_others PARTITION OF PizzaOrders DEFAULT;

\i ~/pizza_orders_data.sql;
  1. Check the table structure and records distribution across partitioned tables:
\d+ PizzaOrders;

SELECT tableoid::regclass,* from PizzaOrders 
  ORDER BY order_id;

Detach Partitioned Table

  1. Detach a partitioned table from the primary one:
ALTER TABLE PizzaOrders DETACH PARTITION orders_2022_05;
  1. Confrim you still can query the table is detached but you still can query it:
SELECT tableoid::regclass,* from PizzaOrders 
  ORDER BY order_id;

SELECT * FROM orders_2022_05;
  1. Insert the data from the detached partition:
INSERT INTO PizzaOrders (order_id,order_status,order_time) 
  SELECT detached.order_id, detached.order_status, detached.order_time 
  FROM orders_2022_05 as detached;

DROP TABLE orders_2022_05;

SELECT tableoid::regclass,* from PizzaOrders 
  ORDER BY order_id;

Attach New Partitioned Table

  1. Create a new partition:
CREATE TABLE orders_2022_07 PARTITION OF PizzaOrders 
  FOR VALUES FROM('2022-07-01') TO ('2022-08-01');
  1. Insert a few records:
INSERT INTO PizzaOrders VALUES 
(9, 'ordered', '2022-07-02 10:00:00'),
(10, 'baking', '2022-07-02 9:50:00'),
(11, 'yummy-in-my-tummy', '2022-07-01 18:10:00');
  1. Confirm that data gets added to the new partition:
SELECT tableoid::regclass,* from PizzaOrders 
  ORDER BY order_id;

Create Multilevel Partitions

  1. Create multilevel/composite partitions:
DROP TABLE PizzaOrders; 
CREATE TABLE PizzaOrders
 (
   order_id   int,
   order_status   status_t,
   order_time   timestamp
 ) PARTITION BY RANGE (order_time);
 
CREATE TABLE orders_2022_06 PARTITION OF PizzaOrders 
  FOR VALUES FROM('2022-06-01') TO ('2022-07-01')
  PARTITION BY LIST(order_status);

CREATE TABLE orders_in_progress PARTITION OF orders_2022_06 
  FOR VALUES IN('ordered','baking');

CREATE TABLE orders_in_delivery PARTITION OF orders_2022_06 
  FOR VALUES IN('delivering');
  
CREATE TABLE orders_completed PARTITION OF orders_2022_06 
  FOR VALUES IN('yummy-in-my-tummy');
  
CREATE TABLE orders_others PARTITION OF PizzaOrders DEFAULT;

\i ~/pizza_orders_data.sql;
  1. Check partitions and data distribution:
\d+ PizzaOrders;
\d+ orders_2022_06;

SELECT tableoid::regclass,* from PizzaOrders 
ORDER BY order_id;
@dmagda
Copy link
Author

dmagda commented Jun 27, 2022

The blog

Partition by LIST

CREATE TABLE PizzaOrders
 (
   id   int,
   status   status_t,
   ordertime   timestamp,
   PRIMARY KEY (id, status)
 ) PARTITION BY LIST (status);
 
CREATE TABLE OrdersInProgress PARTITION OF PizzaOrders 
  FOR VALUES IN('ordered','baking');
  
CREATE TABLE OrdersInDelivery PARTITION OF PizzaOrders 
  FOR VALUES IN('delivering');
  
CREATE TABLE OrdersCompleted PARTITION OF PizzaOrders 
  FOR VALUES IN('yummy-in-my-tummy');
\d+ PizzaOrders;
                                            Partitioned table "public.pizzaorders"
  Column   |            Type             | Collation | Nullable | Default | Storage | Compression | Stats target | Description 
-----------+-----------------------------+-----------+----------+---------+---------+-------------+--------------+-------------
 id        | integer                     |           | not null |         | plain   |             |              | 
 status    | status_t                    |           | not null |         | plain   |             |              | 
 ordertime | timestamp without time zone |           |          |         | plain   |             |              | 
Partition key: LIST (status)
Indexes:
    "pizzaorders_pkey" PRIMARY KEY, btree (id, status)
Partitions: orderscompleted FOR VALUES IN ('yummy-in-my-tummy'),
            ordersindelivery FOR VALUES IN ('delivering'),
            ordersinprogress FOR VALUES IN ('ordered', 'baking')

Insert data:

INSERT INTO PizzaOrders VALUES 
(1, 'yummy-in-my-tummy', '2021-12-27 22:00:00'),
(2, 'yummy-in-my-tummy', '2022-05-15 13:00:00'),
(3, 'yummy-in-my-tummy', '2022-05-23 10:00:00'),
(4, 'yummy-in-my-tummy', '2022-06-23 19:00:00'),
(5, 'delivering', '2022-06-24 8:30:00'),
(6, 'baking', '2022-06-24 8:45:00'),
(7, 'baking', '2022-06-24 9:00:00'),
(8, 'ordered', '2022-06-24 10:00:00'); 

See data distribution:

SELECT tableoid::regclass,* from PizzaOrders 
  ORDER BY id;

     tableoid     | id |      status       |      ordertime      
------------------+----+-------------------+---------------------
 orderscompleted  |  1 | yummy-in-my-tummy | 2021-12-27 22:00:00
 orderscompleted  |  2 | yummy-in-my-tummy | 2022-05-15 13:00:00
 orderscompleted  |  3 | yummy-in-my-tummy | 2022-05-23 10:00:00
 orderscompleted  |  4 | yummy-in-my-tummy | 2022-06-23 19:00:00
 ordersindelivery |  5 | delivering        | 2022-06-24 08:30:00
 ordersinprogress |  6 | baking            | 2022-06-24 08:45:00
 ordersinprogress |  7 | baking            | 2022-06-24 09:00:00
 ordersinprogress |  8 | ordered           | 2022-06-24 10:00:00
EXPLAIN ANALYZE SELECT * FROM PizzaOrders 
  WHERE status = 'yummy-in-my-tummy';
                                                           QUERY PLAN                                                           
--------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on orderscompleted pizzaorders  (cost=22.03..32.57 rows=9 width=16) (actual time=0.019..0.020 rows=4 loops=1)
   Recheck Cond: (status = 'yummy-in-my-tummy'::status_t)
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on orderscompleted_pkey  (cost=0.00..22.03 rows=9 width=0) (actual time=0.012..0.013 rows=4 loops=1)
         Index Cond: (status = 'yummy-in-my-tummy'::status_t)
 Planning Time: 0.321 ms
 Execution Time: 0.052 ms
EXPLAIN ANALYZE SELECT * FROM PizzaOrders 
  WHERE status::text = lower('yummy-in-my-tummy');
                                                           QUERY PLAN                                                           
--------------------------------------------------------------------------------------------------------------------------------
 Append  (cost=0.00..127.26 rows=27 width=16) (actual time=0.074..0.076 rows=4 loops=1)
   ->  Seq Scan on ordersinprogress pizzaorders_1  (cost=0.00..42.38 rows=9 width=16) (actual time=0.040..0.040 rows=0 loops=1)
         Filter: ((status)::text = 'yummy-in-my-tummy'::text)
         Rows Removed by Filter: 3
   ->  Seq Scan on ordersindelivery pizzaorders_2  (cost=0.00..42.38 rows=9 width=16) (actual time=0.019..0.019 rows=0 loops=1)
         Filter: ((status)::text = 'yummy-in-my-tummy'::text)
         Rows Removed by Filter: 1
   ->  Seq Scan on orderscompleted pizzaorders_3  (cost=0.00..42.38 rows=9 width=16) (actual time=0.014..0.015 rows=4 loops=1)
         Filter: ((status)::text = 'yummy-in-my-tummy'::text)
 Planning Time: 11.435 ms
 Execution Time: 0.222 ms

@dmagda
Copy link
Author

dmagda commented Jun 29, 2022

Second blog. Partition Maintenance.

Partition by RANGE:

CREATE TYPE status_t AS ENUM('ordered', 'baking', 'delivering', 'yummy-in-my-tummy');

CREATE TABLE PizzaOrders
 (
   id   int,
   status   status_t,
   ordertime   timestamp
 ) PARTITION BY RANGE (ordertime);
 
CREATE TABLE orders_2022_06 PARTITION OF PizzaOrders 
  FOR VALUES FROM('2022-06-01') TO ('2022-07-01');
  
CREATE TABLE orders_2022_05 PARTITION OF PizzaOrders 
  FOR VALUES FROM('2022-05-01') TO ('2022-06-01');
  
CREATE TABLE orders_others PARTITION OF PizzaOrders DEFAULT;

Loading data:

INSERT INTO PizzaOrders VALUES 
(1, 'yummy-in-my-tummy', '2021-12-27 22:00:00'),
(2, 'yummy-in-my-tummy', '2022-05-15 13:00:00'),
(3, 'yummy-in-my-tummy', '2022-05-23 10:00:00'),
(4, 'yummy-in-my-tummy', '2022-06-23 19:00:00'),
(5, 'delivering', '2022-06-24 8:30:00'),
(6, 'baking', '2022-06-24 8:45:00'),
(7, 'baking', '2022-06-24 9:00:00'),
(8, 'ordered', '2022-06-24 10:00:00'); 

SELECT * from orders_2022_06 WHERE ordertime between '2022_06_20' and '2022_06_30';

 id |      status       |      ordertime      
----+-------------------+---------------------
  4 | yummy-in-my-tummy | 2022-06-23 19:00:00
  5 | delivering        | 2022-06-24 08:30:00
  6 | baking            | 2022-06-24 08:45:00
  7 | baking            | 2022-06-24 09:00:00
  8 | ordered           | 2022-06-24 10:00:00

@dmagda
Copy link
Author

dmagda commented Jul 1, 2022

Geo-Partitioned Cluster

mkdir ~/yb_docker_data

docker network create yugabytedb_network

docker run -d --name yugabytedb_node_us --net yugabytedb_network -p 7001:7000 -p 9000:9000 -p 5433:5433 \
  -v ~/yb_docker_data/node_us:/home/yugabyte/yb_data --restart unless-stopped \
  yugabytedb/yugabyte:latest bin/yugabyted start --listen=yugabytedb_node_us \
  --base_dir=/home/yugabyte/yb_data --daemon=false \
  --master_flags="placement_zone=A,placement_region=US,placement_cloud=CLOUD" \
  --tserver_flags="placement_zone=A,placement_region=US,placement_cloud=CLOUD"
  
docker run -d --name yugabytedb_node_eu --net yugabytedb_network \
  -v ~/yb_docker_data/node_eu:/home/yugabyte/yb_data --restart unless-stopped \
  yugabytedb/yugabyte:latest bin/yugabyted start --join=yugabytedb_node_us --listen=yugabytedb_node_eu \
  --base_dir=/home/yugabyte/yb_data --daemon=false \
  --master_flags="placement_zone=A,placement_region=EU,placement_cloud=CLOUD" \
  --tserver_flags="placement_zone=A,placement_region=EU,placement_cloud=CLOUD"
      
docker run -d --name yugabytedb_node_apac --net yugabytedb_network \
  -v ~/yb_docker_data/node_apac:/home/yugabyte/yb_data --restart unless-stopped \
  yugabytedb/yugabyte:latest bin/yugabyted start --join=yugabytedb_node_us --listen=yugabytedb_node_apac \
  --base_dir=/home/yugabyte/yb_data --daemon=false \
 --master_flags="placement_zone=A,placement_region=APAC,placement_cloud=CLOUD" \
  --tserver_flags="placement_zone=A,placement_region=APAC,placement_cloud=CLOUD"

Modify placement:

docker exec -i yugabytedb_node_us \
yb-admin -master_addresses yugabytedb_node_us:7100,yugabytedb_node_eu:7100,yugabytedb_node_apac:7100 \
modify_placement_info CLOUD.US.A,CLOUD.EU.A,CLOUD.APAC.A 3

Create tables spaces:

CREATE TABLESPACE us_tablespace WITH (
  replica_placement='{"num_replicas": 1, "placement_blocks":
  [{"cloud":"CLOUD","region":"US","zone":"A","min_num_replicas":1}]}'
);

CREATE TABLESPACE eu_tablespace WITH (
  replica_placement='{"num_replicas": 1, "placement_blocks":
  [{"cloud":"CLOUD","region":"EU","zone":"A","min_num_replicas":1}]}'
);

CREATE TABLESPACE apac_tablespace WITH (
  replica_placement='{"num_replicas": 1, "placement_blocks":
  [{"cloud":"CLOUD","region":"APAC","zone":"A","min_num_replicas":1}]}'
);

Create partitions:

CREATE TYPE status_t AS ENUM('ordered', 'baking', 'delivering', 'yummy-in-my-tummy');

CREATE TABLE PizzaOrders
 (
   order_id   int,
   order_status   status_t,
   order_time   timestamp,
   region text,
   PRIMARY KEY (order_id, region)
 ) PARTITION BY LIST (region);

CREATE TABLE Orders_US
    PARTITION OF PizzaOrders
    FOR VALUES IN ('US') TABLESPACE us_tablespace;

CREATE TABLE Orders_EU
    PARTITION OF PizzaOrders
    FOR VALUES IN ('EU') TABLESPACE eu_tablespace;

CREATE TABLE Orders_APAC
    PARTITION OF PizzaOrders
    FOR VALUES IN ('APAC') TABLESPACE apac_tablespace;

Load data in the US:

INSERT INTO PizzaOrders VALUES 
(1, 'yummy-in-my-tummy', '2021-12-27 22:00:00', 'US'),
(2, 'yummy-in-my-tummy', '2022-05-15 13:00:00', 'US'),
(6, 'baking', '2022-06-24 8:45:00', 'US'),
(7, 'baking', '2022-06-24 9:00:00', 'US'); 

Load data in ASIA and EU:

SET force_global_transaction = TRUE;

INSERT INTO PizzaOrders VALUES 
(3, 'yummy-in-my-tummy', '2022-05-23 10:00:00', 'EU'),
(4, 'yummy-in-my-tummy', '2022-06-23 19:00:00', 'APAC'),
(5, 'delivering', '2022-06-24 8:30:00', 'APAC'),
(8, 'ordered', '2022-06-24 10:00:00', 'EU'); 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment