- Setup Local Environment:
- Create Sample Schema
- Partition By Hash
- Partition By List
- Partition By Range
- Create Multilevel Partitions
Follow instructions below to experiment with partitioning in YugabyteDB.
- Start a Postgres instance:
docker run --name postgresql -e POSTGRES_USER=postgres -e POSTGRES_PASSWORD=postgres -p 5432:5432 -v ~/postgresql_data/:/var/lib/postgresql/data -d postgres
- Connect with psql:
psql -h 127.0.0.1 --username=postgres
- Start a single-node cluster:
mkdir ~/yb_docker_data
docker network create yugabytedb_network
docker run -d --name yugabytedb_node1 --net yugabytedb_network \
-p 7001:7000 -p 9000:9000 -p 5433:5433 \
-v ~/yb_docker_data/node1:/home/yugabyte/yb_data --restart unless-stopped \
yugabytedb/yugabyte:latest \
bin/yugabyted start --listen=yugabytedb_node1 \
--base_dir=/home/yugabyte/yb_data --daemon=false
docker run -d --name yugabytedb_node2 --net yugabytedb_network \
-v ~/yb_docker_data/node2:/home/yugabyte/yb_data --restart unless-stopped \
yugabytedb/yugabyte:latest \
bin/yugabyted start --join=yugabytedb_node1 --listen=yugabytedb_node2 \
--base_dir=/home/yugabyte/yb_data --daemon=false
docker run -d --name yugabytedb_node3 --net yugabytedb_network \
-v ~/yb_docker_data/node3:/home/yugabyte/yb_data --restart unless-stopped \
yugabytedb/yugabyte:latest \
bin/yugabyted start --join=yugabytedb_node1 --listen=yugabytedb_node3 \
--base_dir=/home/yugabyte/yb_data --daemon=false
-
Confirm the container runs normally: http://127.0.0.1:7001
-
Connect with psql:
psql -h 127.0.0.1 -p 5433 yugabyte -U yugabyte -w
- Create the schema:
CREATE TYPE status_t AS ENUM('ordered', 'baking', 'delivering', 'yummy-in-my-tummy');
DROP TABLE IF EXISTS PizzaOrders;
CREATE TABLE PizzaOrders
(
order_id int PRIMARY KEY,
order_status status_t,
order_time timestamp
);
- Create the
pizza_orders_data.sql
with the following content in your home directory:
INSERT INTO PizzaOrders VALUES
(1, 'yummy-in-my-tummy', '2021-12-27 22:00:00'),
(2, 'yummy-in-my-tummy', '2022-05-15 13:00:00'),
(3, 'yummy-in-my-tummy', '2022-05-23 10:00:00'),
(4, 'yummy-in-my-tummy', '2022-06-23 19:00:00'),
(5, 'delivering', '2022-06-24 8:30:00'),
(6, 'baking', '2022-06-24 8:45:00'),
(7, 'baking', '2022-06-24 9:00:00'),
(8, 'ordered', '2022-06-24 10:00:00');
- Load the data:
\i ~/pizza_orders_data.sql;
- Check the table structure:
\d+ PizzaOrders;
SELECT tableoid::regclass,* from PizzaOrders
ORDER BY order_id;
A hash partition is created by using modulus and remainder for each partition:
DROP TABLE PizzaOrders;
CREATE TABLE PizzaOrders
(
order_id int,
order_status status_t,
order_time timestamp
) PARTITION BY HASH (order_id);
CREATE TABLE orders_part1 PARTITION OF PizzaOrders
FOR VALUES WITH (modulus 2, remainder 0);
CREATE TABLE orders_part2 PARTITION OF PizzaOrders
FOR VALUES WITH (modulus 2, remainder 1);
\i ~/pizza_orders_data.sql;
Check the table structure and records distribution across partitioned tables:
\d+ PizzaOrders;
SELECT tableoid::regclass,* from PizzaOrders
ORDER BY order_id;
Partition data by the order status.
- Reload the schema and create partitions:
DROP TABLE PizzaOrders;
CREATE TABLE PizzaOrders
(
order_id int,
order_status status_t,
order_time timestamp,
PRIMARY KEY (order_id, order_status)
) PARTITION BY LIST (order_status);
CREATE TABLE orders_in_progress PARTITION OF PizzaOrders
FOR VALUES IN('ordered','baking');
CREATE TABLE orders_in_delivery PARTITION OF PizzaOrders
FOR VALUES IN('delivering');
CREATE TABLE orders_completed PARTITION OF PizzaOrders
FOR VALUES IN('yummy-in-my-tummy');
\i ~/pizza_orders_data.sql;
- Check the table structure and records distribution across partitioned tables:
\d+ PizzaOrders;
SELECT tableoid::regclass,* from PizzaOrders
ORDER BY order_id;
Records are moved between partitions automatically if the value of a partitioning column is changed.
- Check the status and partition for the record with
order_id=6
:
SELECT tableoid::regclass,* from PizzaOrders
WHERE order_id=6 ORDER BY order_id;
- Change the status to
delivering
:
UPDATE PizzaOrders SET order_status='delivering' WHERE order_id=6;
- Confirm the record was moved to a different partition:
SELECT tableoid::regclass,* from PizzaOrders
WHERE order_id=6 ORDER BY order_id;
- Run this query and confirm that the scan happened only on
orders_completed
partition:
EXPLAIN ANALYZE SELECT * FROM PizzaOrders
WHERE order_status = 'yummy-in-my-tummy';
- Turn the partition pruning off and observe how the entire data set is scanned:
SET enable_partition_pruning = off;
EXPLAIN ANALYZE SELECT * FROM PizzaOrders
WHERE order_status = 'yummy-in-my-tummy';
- Enable the pruning back:
SET enable_partition_pruning = on;
- Apply functions and compare results:
EXPLAIN ANALYZE SELECT * FROM PizzaOrders
WHERE order_status::text = lower('yummy-in-my-tummy');
Partition records by range using order_time
as a partition column.
- Reload the schema and create partitions:
DROP TABLE PizzaOrders;
CREATE TABLE PizzaOrders
(
order_id int,
order_status status_t,
order_time timestamp,
PRIMARY KEY (order_id, order_time)
) PARTITION BY RANGE (order_time);
CREATE TABLE orders_2022_06 PARTITION OF PizzaOrders
FOR VALUES FROM('2022-06-01') TO ('2022-07-01');
CREATE TABLE orders_2022_05 PARTITION OF PizzaOrders
FOR VALUES FROM('2022-05-01') TO ('2022-06-01');
CREATE TABLE orders_others PARTITION OF PizzaOrders DEFAULT;
\i ~/pizza_orders_data.sql;
- Check the table structure and records distribution across partitioned tables:
\d+ PizzaOrders;
SELECT tableoid::regclass,* from PizzaOrders
ORDER BY order_id;
- Detach a partitioned table from the primary one:
ALTER TABLE PizzaOrders DETACH PARTITION orders_2022_05;
- Confrim you still can query the table is detached but you still can query it:
SELECT tableoid::regclass,* from PizzaOrders
ORDER BY order_id;
SELECT * FROM orders_2022_05;
- Insert the data from the detached partition:
INSERT INTO PizzaOrders (order_id,order_status,order_time)
SELECT detached.order_id, detached.order_status, detached.order_time
FROM orders_2022_05 as detached;
DROP TABLE orders_2022_05;
SELECT tableoid::regclass,* from PizzaOrders
ORDER BY order_id;
- Create a new partition:
CREATE TABLE orders_2022_07 PARTITION OF PizzaOrders
FOR VALUES FROM('2022-07-01') TO ('2022-08-01');
- Insert a few records:
INSERT INTO PizzaOrders VALUES
(9, 'ordered', '2022-07-02 10:00:00'),
(10, 'baking', '2022-07-02 9:50:00'),
(11, 'yummy-in-my-tummy', '2022-07-01 18:10:00');
- Confirm that data gets added to the new partition:
SELECT tableoid::regclass,* from PizzaOrders
ORDER BY order_id;
- Create multilevel/composite partitions:
DROP TABLE PizzaOrders;
CREATE TABLE PizzaOrders
(
order_id int,
order_status status_t,
order_time timestamp
) PARTITION BY RANGE (order_time);
CREATE TABLE orders_2022_06 PARTITION OF PizzaOrders
FOR VALUES FROM('2022-06-01') TO ('2022-07-01')
PARTITION BY LIST(order_status);
CREATE TABLE orders_in_progress PARTITION OF orders_2022_06
FOR VALUES IN('ordered','baking');
CREATE TABLE orders_in_delivery PARTITION OF orders_2022_06
FOR VALUES IN('delivering');
CREATE TABLE orders_completed PARTITION OF orders_2022_06
FOR VALUES IN('yummy-in-my-tummy');
CREATE TABLE orders_others PARTITION OF PizzaOrders DEFAULT;
\i ~/pizza_orders_data.sql;
- Check partitions and data distribution:
\d+ PizzaOrders;
\d+ orders_2022_06;
SELECT tableoid::regclass,* from PizzaOrders
ORDER BY order_id;
Second blog. Partition Maintenance.
Partition by RANGE:
Loading data: