-
Create two Postgres clusters, one on 5432 port (
node1
) and one on 5433 (node2
). Compile and installpostgres_fdw
extension on both. -
Set
enable_partitionwise_join
andenable_partitionwise_aggregate
toon
. -
Set
postgres_fdw.use_remote_estimate
totrue
. -
On
node2
:
\i init2.sql
- On
node1
:
\i init1.sql
\i load.sql
- Run
ANALYSE
on bothnode1
andnode2
.
Or simply run setup.sh script if all Postgres binaries are present in the PATH.
Then check some queries from the queries.sql.
At the time of 30.06.2020 PostgreSQL cannot efectively push-down queries to the specific node, when postgres_fdw
+ native partitioning and a filter by sharding key are used, e.g.:
postgres_node1=# EXPLAIN (
ANALYZE,
VERBOSE
)
SELECT
*
FROM
documents
INNER JOIN users ON documents.user_id = users.id
WHERE
documents.company_id = 5
AND users.company_id = 5;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=200.00..248.36 rows=1 width=100) (actual time=3.024..688.854 rows=20235 loops=1)
Output: documents.company_id, documents.id, documents.user_id, documents.created_at, documents.text, users.company_id, users.id, users.created_at, users.name
Join Filter: (documents.user_id = users.id)
Rows Removed by Join Filter: 2003265
-> Foreign Scan on public.users_node2 users (cost=100.00..124.33 rows=6 width=48) (actual time=1.280..1.697 rows=100 loops=1)
Output: users.company_id, users.id, users.created_at, users.name
Remote SQL: SELECT company_id, id, created_at, name FROM public.users_node2 WHERE ((company_id = 5))
-> Materialize (cost=100.00..123.59 rows=5 width=52) (actual time=0.017..3.366 rows=20235 loops=100)
Output: documents.company_id, documents.id, documents.user_id, documents.created_at, documents.text
-> Foreign Scan on public.documents_node2 documents (cost=100.00..123.56 rows=5 width=52) (actual time=1.668..131.552 rows=20235 loops=1)
Output: documents.company_id, documents.id, documents.user_id, documents.created_at, documents.text
Remote SQL: SELECT company_id, id, user_id, created_at, text FROM public.documents_node2 WHERE ((company_id = 5))
Planning Time: 0.378 ms
Execution Time: 692.072 ms
It uses two foreign scans and performs a final join locally. However, once one specifies a required partition in the query it is immediately pushed-down to the required node:
postgres_node1=# EXPLAIN (
ANALYZE,
VERBOSE
)
SELECT
*
FROM
documents_node2
INNER JOIN users_node2 ON documents_node2.user_id = users_node2.id
WHERE
documents_node2.company_id = 5
AND users_node2.company_id = 5;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Foreign Scan (cost=100.00..147.92 rows=1 width=100) (actual time=2.973..204.586 rows=20235 loops=1)
Output: documents_node2.company_id, documents_node2.id, documents_node2.user_id, documents_node2.created_at, documents_node2.text, users_node2.company_id, users_node2.id, users_node2.created_at, users_node2.name
Relations: (public.documents_node2) INNER JOIN (public.users_node2)
Remote SQL: SELECT r1.company_id, r1.id, r1.user_id, r1.created_at, r1.text, r2.company_id, r2.id, r2.created_at, r2.name FROM (public.documents_node2 r1 INNER JOIN public.users_node2 r2 ON (((r1.user_id = r2.id)) AND ((r2.company_id = 5)) AND ((r1.company_id = 5))))
Planning Time: 0.217 ms
Execution Time: 206.470 ms
You can see that execution time has dropped significantly — by more than 3 times.