MySQL's documentation states:
When combining LIMIT row_count with DISTINCT, MySQL stops as soon as it finds row_count unique rows.
The question was asked, how is this deterministic / accurate? My hypothesis is that it's due to MySQL's use of a clustering
index, as opposed to Postgres' heap storage. I think that, given a monotonic PK such as an AUTO_INCREMENT
(or perhaps
any index), it's able to use that to guarantee determinism.
To test this, tables will be created in MySQL 8.3 and Postgres 15.6, and various queries ran on them.
mysql> CREATE TABLE user (id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, first_name VARCHAR(255) NOT NULL, last_name VARCHAR(255) NULL, email VARCHAR(254) NOT NULL, city VARCHAR(255) NOT NULL, country VARCHAR(255) NOT NULL, created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP);
Query OK, 0 rows affected (0.02 sec)
postgres=# CREATE TABLE "user" (id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, first_name VARCHAR(126) NOT NULL, last_name VARCHAR(126) NULL, email VARCHAR(254) NOT NULL, city VARCHAR(126) NOT NULL, country VARCHAR(126) NOT NULL, created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP);
CREATE TABLE
-- used genSQL to generate 1,000,000 rows, then loaded into each
Find the upper limit of what may be later selected for use.
mysql> SELECT COUNT(DISTINCT first_name) FROM user;
+----------------------------+
| COUNT(DISTINCT first_name) |
+----------------------------+
| 8425 |
+----------------------------+
1 row in set (1.10 sec)
postgres=# SELECT COUNT(DISTINCT first_name) FROM "user";
count
-------
8425
(1 row)
Time: 1838.531 ms (00:01.839)
mysql> EXPLAIN ANALYZE SELECT DISTINCT first_name FROM user LIMIT 5000\G
*************************** 1. row ***************************
EXPLAIN: -> Limit: 5000 row(s) (cost=205672..205735 rows=5000) (actual time=6.86..7.67 rows=5000 loops=1)
-> Table scan on <temporary> (cost=205672..218119 rows=995530) (actual time=6.85..7.4 rows=5000 loops=1)
-> Temporary table with deduplication (cost=205672..205672 rows=995530) (actual time=6.85..6.85 rows=5000 loops=1)
-> Limit table size: 5000 unique row(s)
-> Table scan on user (cost=106119 rows=995530) (actual time=0.132..2.91 rows=7512 loops=1)
1 row in set (0.01 sec)
postgres=# EXPLAIN (ANALYZE, BUFFERS, COSTS) SELECT DISTINCT first_name FROM "user" LIMIT 5000;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=21867.29..21917.29 rows=5000 width=9) (actual time=168.450..177.237 rows=5000 loops=1)
Buffers: shared hit=13845
-> HashAggregate (cost=21867.29..21951.66 rows=8437 width=9) (actual time=168.448..176.779 rows=5000 loops=1)
Group Key: first_name
Batches: 1 Memory Usage: 913kB
Buffers: shared hit=13845
-> Gather (cost=20053.33..21825.10 rows=16874 width=9) (actual time=159.108..170.100 rows=25275 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=13845
-> HashAggregate (cost=19053.33..19137.70 rows=8437 width=9) (actual time=125.266..126.349 rows=8425 loops=3)
Group Key: first_name
Batches: 1 Memory Usage: 913kB
Buffers: shared hit=13845
Worker 0: Batches: 1 Memory Usage: 913kB
Worker 1: Batches: 1 Memory Usage: 913kB
-> Parallel Seq Scan on "user" (cost=0.00..18011.67 rows=416667 width=9) (actual time=0.020..28.529 rows=333333 loops=3)
Buffers: shared hit=13845
Planning Time: 0.132 ms
Execution Time: 177.709 ms
(20 rows)
Time: 178.603 ms
Create a temporary table to insert a subset of rows/columns into.
mysql> CREATE TEMPORARY TABLE user_name (id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, first_name VARCHAR(255) NOT NULL);
Query OK, 0 rows affected (0.00 sec)
postgres=# CREATE TEMPORARY TABLE user_name (id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, first_name VARCHAR(126) NOT NULL);
CREATE TABLE
Time: 6.562 ms
Technically, MySQL will do an EXPLAIN ANALYZE INSERT
, but it doesn't actually retain the data, so it's more like a
plain EXPLAIN
. Shown here for clarity anyway, and then a normal INSERT
is done.
mysql> EXPLAIN ANALYZE INSERT INTO user_name (first_name) SELECT DISTINCT first_name FROM user LIMIT 5000\G
*************************** 1. row ***************************
EXPLAIN: -> Insert into user_name
-> Limit: 5000 row(s) (cost=205672..205735 rows=5000) (actual time=14.8..15.5 rows=5000 loops=1)
-> Table scan on <temporary> (cost=205672..218119 rows=995530) (actual time=14.8..15.2 rows=5000 loops=1)
-> Temporary table with deduplication (cost=205672..205672 rows=995530) (actual time=14.8..14.8 rows=5000 loops=1)
-> Limit table size: 5000 unique row(s)
-> Table scan on user (cost=106119 rows=995530) (actual time=0.0207..9 rows=7512 loops=1)
1 row in set (0.02 sec)
mysql> INSERT INTO user_name (first_name) SELECT DISTINCT first_name FROM user LIMIT 5000;
Query OK, 5000 rows affected (0.04 sec)
Records: 5000 Duplicates: 0 Warnings: 0
postgres=# EXPLAIN (ANALYZE, BUFFERS, COSTS) INSERT INTO user_name (first_name) SELECT DISTINCT first_name FROM "user" LIMIT 5000;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Insert on user_name (cost=26345.00..26457.50 rows=0 width=0) (actual time=324.459..324.467 rows=0 loops=1)
Buffers: shared hit=13866 dirtied=3, local hit=19659 read=3 dirtied=45 written=43
-> Subquery Scan on "*SELECT*" (cost=26345.00..26457.50 rows=5000 width=13) (actual time=310.261..315.059 rows=5000 loops=1)
Buffers: shared hit=13853 dirtied=1, local hit=5000
-> Limit (cost=26345.00..26395.00 rows=5000 width=9) (actual time=310.189..311.924 rows=5000 loops=1)
Buffers: shared hit=13845
-> HashAggregate (cost=26345.00..26429.37 rows=8437 width=9) (actual time=310.187..311.379 rows=5000 loops=1)
Group Key: "user".first_name
Batches: 1 Memory Usage: 913kB
Buffers: shared hit=13845
-> Seq Scan on "user" (cost=0.00..23845.00 rows=1000000 width=9) (actual time=0.008..67.769 rows=1000000 loops=1)
Buffers: shared hit=13845
Planning:
Buffers: shared hit=5 dirtied=3
Planning Time: 0.232 ms
Execution Time: 324.619 ms
(16 rows)
Verify the results.
mysql> SELECT COUNT(DISTINCT first_name) FROM user_name;
+----------------------------+
| COUNT(DISTINCT first_name) |
+----------------------------+
| 5000 |
+----------------------------+
1 row in set (0.00 sec)
postgres=# SELECT COUNT(DISTINCT first_name) FROM user_name;
count
-------
5000
(1 row)
Time: 11.089 ms
Here, the hypothesis is tested (I think). Without an explicit ORDER BY
, the query engine is free to return tuples
however it sees fit.
Note that the IDs and names initially match, despite no ORDER BY
being used for either the INSERT
or SELECT
.
As expected later in the table, they diverge as the source table has duplicates.
mysql> SELECT u.id, u.first_name, un.id, un.first_name FROM user u JOIN user_name un ON u.id = un.id LIMIT 10;
+----+------------+----+------------+
| id | first_name | id | first_name |
+----+------------+----+------------+
| 1 | Hakeem | 1 | Hakeem |
| 2 | Lilith | 2 | Lilith |
| 3 | Alphonse | 3 | Alphonse |
| 4 | Dionne | 4 | Dionne |
| 5 | Gasparo | 5 | Gasparo |
| 6 | Dixie | 6 | Dixie |
| 7 | Valentijn | 7 | Valentijn |
| 8 | Guillema | 8 | Guillema |
| 9 | Elnore | 9 | Elnore |
| 10 | Shaine | 10 | Shaine |
+----+------------+----+------------+
10 rows in set (0.00 sec)
mysql> SELECT u.id, u.first_name, un.id, un.first_name FROM user u JOIN user_name un ON u.id = un.id LIMIT 10 OFFSET 999;
+------+------------+------+------------+
| id | first_name | id | first_name |
+------+------------+------+------------+
| 1000 | Weider | 1000 | Trev |
| 1001 | Cherianne | 1001 | Sanders |
| 1002 | Pierrette | 1002 | Eadie |
| 1003 | Alvan | 1003 | Kriste |
| 1004 | Mechelle | 1004 | Bernetta |
| 1005 | Tierney | 1005 | Maynard |
| 1006 | Anallese | 1006 | Sibella |
| 1007 | Godfree | 1007 | Clio |
| 1008 | Jessey | 1008 | Ashby |
| 1009 | Aland | 1009 | Lethia |
+------+------------+------+------------+
10 rows in set (0.00 sec)
In contrast, the IDs here do not match at any point.
postgres=# SELECT u.id, u.first_name, un.id, un.first_name FROM "user" u JOIN user_name un ON u.id = un.id LIMIT 10;
id | first_name | id | first_name
----+-------------+----+------------------
1 | 'Hakeem' | 1 | 'Calli'
2 | 'Lilith' | 2 | 'Nikolaus'
3 | 'Alphonse' | 3 | 'Franklyn'
4 | 'Dionne' | 4 | 'Shelby'
5 | 'Gasparo' | 5 | 'Rolph'
6 | 'Dixie' | 6 | 'Alvie'
7 | 'Valentijn' | 7 | 'Pollyanna'
8 | 'Guillema' | 8 | 'Freemon'
9 | 'Elnore' | 9 | 'Sheila-Kathryn'
10 | 'Shaine' | 10 | 'Ashton'
(10 rows)
Time: 1.503 ms
postgres=# SELECT u.id, u.first_name, un.id, un.first_name FROM "user" u JOIN user_name un ON u.id = un.id LIMIT 10 OFFSET 999;
id | first_name | id | first_name
------+-------------+------+------------
1000 | 'Weider' | 1000 | 'Darlene'
1001 | 'Cherianne' | 1001 | 'Bliss'
1002 | 'Pierrette' | 1002 | 'Matthew'
1003 | 'Alvan' | 1003 | 'Murdoch'
1004 | 'Mechelle' | 1004 | 'Aveline'
1005 | 'Tierney' | 1005 | 'Derward'
1006 | 'Anallese' | 1006 | 'Jone'
1007 | 'Godfree' | 1007 | 'Fin'
1008 | 'Jessey' | 1008 | 'Rodi'
1009 | 'Aland' | 1009 | 'Verge'
(10 rows)
Time: 2.464 ms
Now, an index will be created on first_name
to see if it will be used for the insertion, and if so, what will differ.
mysql> CREATE INDEX user_first_name_idx ON user(first_name);
Query OK, 0 rows affected (3.34 sec)
Records: 0 Duplicates: 0 Warnings: 0
postgres=# CREATE INDEX user_first_name_idx ON "user" (first_name);
CREATE INDEX
Time: 1221.991 ms (00:01.222)
Again, EXPLAIN ANALYZE INSERT
doesn't actually retain the data, but it's here to demonstrate the plan.
As expected, the index was used as a covering index.
mysql> EXPLAIN ANALYZE INSERT INTO user_name (first_name) SELECT DISTINCT first_name FROM user LIMIT 5000\G
*************************** 1. row ***************************
EXPLAIN: -> Insert into user_name
-> Limit: 5000 row(s) (cost=14195..14259 rows=5000) (actual time=162..163 rows=5000 loops=1)
-> Table scan on <temporary> (cost=14195..14305 rows=8603) (actual time=162..162 rows=5000 loops=1)
-> Temporary table with deduplication (cost=14195..14195 rows=8603) (actual time=162..162 rows=5000 loops=1)
-> Limit table size: 5000 unique row(s)
-> Covering index skip scan for deduplication on user using user_first_name_idx (cost=13335 rows=8603) (actual time=0.745..155 rows=5000 loops=1)
1 row in set (0.16 sec)
mysql> INSERT INTO user_name (first_name) SELECT DISTINCT first_name FROM user LIMIT 5000;
Query OK, 5000 rows affected (0.06 sec)
Records: 5000 Duplicates: 0 Warnings: 0
As with MySQL, the retrieval is an index only scan.
postgres=# EXPLAIN (ANALYZE, BUFFERS, COSTS) INSERT INTO user_name (first_name) SELECT DISTINCT first_name FROM "user" LIMIT 5000;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Insert on user_name (cost=0.42..11012.14 rows=0 width=0) (actual time=120.812..120.819 rows=0 loops=1)
Buffers: shared hit=22 read=524, local hit=19659 read=3 dirtied=45 written=43
-> Subquery Scan on "*SELECT*" (cost=0.42..11012.14 rows=5000 width=13) (actual time=0.111..109.896 rows=5000 loops=1)
Buffers: shared hit=9 read=524, local hit=5000
-> Limit (cost=0.42..10949.64 rows=5000 width=9) (actual time=0.075..105.797 rows=5000 loops=1)
Buffers: shared hit=1 read=524
-> Unique (cost=0.42..18476.12 rows=8437 width=9) (actual time=0.074..105.200 rows=5000 loops=1)
Buffers: shared hit=1 read=524
-> Index Only Scan using user_first_name_idx on "user" (cost=0.42..15976.12 rows=1000000 width=9) (actual time=0.072..57.860 rows=592640 loops=1)
Heap Fetches: 0
Buffers: shared hit=1 read=524
Planning:
Buffers: shared hit=16 read=1
Planning Time: 0.554 ms
Execution Time: 120.911 ms
(15 rows)
Time: 122.322 ms
Though no ORDER BY
was used, the B+tree index was used for the INSERT
queries.
The temporary table created from the SELECT DISTINCT
which utilized an index is now ordered, again, without any
explicit ORDER BY
being used.
mysql> SELECT u.id, u.first_name, un.id, un.first_name FROM user u JOIN user_name un ON u.id = un.id LIMIT 10;
+----+------------+----+------------+
| id | first_name | id | first_name |
+----+------------+----+------------+
| 1 | Hakeem | 1 | Aaren |
| 2 | Lilith | 2 | Aarika |
| 3 | Alphonse | 3 | Aaron |
| 4 | Dionne | 4 | Ab |
| 5 | Gasparo | 5 | Abagael |
| 6 | Dixie | 6 | Abagail |
| 7 | Valentijn | 7 | Abba |
| 8 | Guillema | 8 | Abbe |
| 9 | Elnore | 9 | Abbey |
| 10 | Shaine | 10 | Abbi |
+----+------------+----+------------+
10 rows in set (0.00 sec)
The same outcome was seen in Postgres as MySQL.
postgres=# SELECT u.id, u.first_name, un.id, un.first_name FROM "user" u JOIN user_name un ON u.id = un.id UNION ALL SELECT u.id, u.first_name, un.id, un.first_name FROM "user" u JOIN user_name un ON u.id = un.id LIMIT 10;
id | first_name | id | first_name
----+-------------+----+------------
1 | 'Hakeem' | 1 | 'Aaren'
2 | 'Lilith' | 2 | 'Aarika'
3 | 'Alphonse' | 3 | 'Aaron'
4 | 'Dionne' | 4 | 'Ab'
5 | 'Gasparo' | 5 | 'Abagael'
6 | 'Dixie' | 6 | 'Abagail'
7 | 'Valentijn' | 7 | 'Abba'
8 | 'Guillema' | 8 | 'Abbe'
9 | 'Elnore' | 9 | 'Abbey'
10 | 'Shaine' | 10 | 'Abbi'
(10 rows)
Time: 1.145 ms
The same index will be created on first_name
as before, but in DESC
order. The original index is first dropped,
and the temporary tables truncated with id
reset.
mysql> DROP INDEX user_first_name_idx ON user;
Query OK, 0 rows affected (0.01 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> CREATE INDEX user_first_name_idx ON user(first_name DESC);
Query OK, 0 rows affected (2.94 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> TRUNCATE user_name;
Query OK, 0 rows affected (0.00 sec)
mysql> ALTER TABLE user_name AUTO_INCREMENT=1;
Query OK, 0 rows affected (0.01 sec)
Records: 0 Duplicates: 0 Warnings: 0
postgres=# DROP INDEX user_first_name_idx;
DROP INDEX
Time: 7.335 ms
postgres=# CREATE INDEX user_first_name_idx ON "user" (first_name DESC);
CREATE INDEX
Time: 1245.648 ms (00:01.246)
postgres=# TRUNCATE user_name RESTART IDENTITY;
TRUNCATE TABLE
Time: 4.899 ms
Same disclaimer as before.
This time, the covering index scan is subtly different - it's a skip scan. MySQL documentation.
mysql> EXPLAIN ANALYZE INSERT INTO user_name (first_name) SELECT DISTINCT first_name FROM user LIMIT 5000\G
*************************** 1. row ***************************
EXPLAIN: -> Insert into user_name
-> Limit: 5000 row(s) (cost=13690..13754 rows=5000) (actual time=158..159 rows=5000 loops=1)
-> Table scan on <temporary> (cost=13690..13796 rows=8297) (actual time=158..158 rows=5000 loops=1)
-> Temporary table with deduplication (cost=13690..13690 rows=8297) (actual time=158..158 rows=5000 loops=1)
-> Limit table size: 5000 unique row(s)
-> Covering index skip scan for deduplication on user using user_first_name_idx (cost=12860 rows=8297) (actual time=0.637..151 rows=5000 loops=1)
1 row in set (0.16 sec)
mysql> INSERT INTO user_name (first_name) SELECT DISTINCT first_name FROM user LIMIT 5000;
Query OK, 5000 rows affected (0.06 sec)
Records: 5000 Duplicates: 0 Warnings: 0
In Postgres, the index is used again, but it's ran backwards. This becomes evident during the next SELECT
.
postgres=# EXPLAIN (ANALYZE, BUFFERS, COSTS) INSERT INTO user_name (first_name) SELECT DISTINCT first_name FROM "user" LIMIT 5000;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Insert on user_name (cost=0.42..11011.48 rows=0 width=0) (actual time=115.579..115.580 rows=0 loops=1)
Buffers: shared hit=15 read=524, local hit=19659 read=3 dirtied=45 written=43
-> Subquery Scan on "*SELECT*" (cost=0.42..11011.48 rows=5000 width=13) (actual time=0.077..105.200 rows=5000 loops=1)
Buffers: shared hit=7 read=524, local hit=5000
-> Limit (cost=0.42..10948.98 rows=5000 width=9) (actual time=0.047..101.408 rows=5000 loops=1)
Buffers: shared hit=1 read=524
-> Unique (cost=0.42..18475.03 rows=8437 width=9) (actual time=0.045..100.868 rows=5000 loops=1)
Buffers: shared hit=1 read=524
-> Index Only Scan Backward using user_first_name_idx on "user" (cost=0.42..15975.02 rows=1000000 width=9) (actual time=0.043..53.648 rows=592640 loops=1)
Heap Fetches: 0
Buffers: shared hit=1 read=524
Planning:
Buffers: shared hit=17 read=1
Planning Time: 0.451 ms
Execution Time: 115.644 ms
(15 rows)
Time: 116.808 ms
As before, the temporary table is ordered without being explicitly stated, but in reverse – this is logically consistent with the previous findings.
mysql> SELECT u.id, u.first_name, un.id, un.first_name FROM user u JOIN user_name un ON u.id = un.id LIMIT 10;
+----+------------+----+------------+
| id | first_name | id | first_name |
+----+------------+----+------------+
| 1 | Hakeem | 1 | Zuzana |
| 2 | Lilith | 2 | Zulema |
| 3 | Alphonse | 3 | Zsazsa |
| 4 | Dionne | 4 | Zsa Zsa |
| 5 | Gasparo | 5 | Zorine |
| 6 | Dixie | 6 | Zorina |
| 7 | Valentijn | 7 | Zorana |
| 8 | Guillema | 8 | Zorah |
| 9 | Elnore | 9 | Zora |
| 10 | Shaine | 10 | Zonnya |
+----+------------+----+------------+
10 rows in set (0.00 sec)
In comparison, Postgres orders the temporary table the same, which follows the displayed backwards index scan.
postgres=# SELECT u.id, u.first_name, un.id, un.first_name FROM "user" u JOIN user_name un ON u.id = un.id UNION ALL SELECT u.id, u.first_name, un.id, un.first_name FROM "user" u JOIN user_name un ON u.id = un.id LIMIT 10;
id | first_name | id | first_name
----+-------------+----+------------
1 | 'Hakeem' | 1 | 'Aaren'
2 | 'Lilith' | 2 | 'Aarika'
3 | 'Alphonse' | 3 | 'Aaron'
4 | 'Dionne' | 4 | 'Ab'
5 | 'Gasparo' | 5 | 'Abagael'
6 | 'Dixie' | 6 | 'Abagail'
7 | 'Valentijn' | 7 | 'Abba'
8 | 'Guillema' | 8 | 'Abbe'
9 | 'Elnore' | 9 | 'Abbey'
10 | 'Shaine' | 10 | 'Abbi'
(10 rows)
Time: 0.993 ms
Finally, both tables were altered to use a common UUIDv4 (generated externally) as their PK, and temporary tables were then re-created, and the data loaded as before.
mysql> CREATE TEMPORARY TABLE user_name (id CHAR(36) NOT NULL PRIMARY KEY, first_name VARCHAR(255) NOT NULL);
Query OK, 0 rows affected (0.00 sec)
mysql> EXPLAIN ANALYZE INSERT INTO user_name (first_name) SELECT DISTINCT first_name FROM user LIMIT 5000\G
*************************** 1. row ***************************
EXPLAIN: -> Insert into user_name
-> Limit: 5000 row(s) (cost=207523..207585 rows=5000) (actual time=14.9..15.6 rows=5000 loops=1)
-> Table scan on <temporary> (cost=207523..219912 rows=990923) (actual time=14.9..15.4 rows=5000 loops=1)
-> Temporary table with deduplication (cost=207523..207523 rows=990923) (actual time=14.9..14.9 rows=5000 loops=1)
-> Limit table size: 5000 unique row(s)
-> Table scan on user (cost=108430 rows=990923) (actual time=0.0412..9.58 rows=7490 loops=1)
1 row in set (0.02 sec)
mysql> INSERT INTO user_name (id, first_name) SELECT DISTINCT id, first_name FROM user LIMIT 5000;
Query OK, 5000 rows affected (0.05 sec)
Records: 5000 Duplicates: 0 Warnings: 0
postgres=# EXPLAIN (ANALYZE, BUFFERS, COSTS) INSERT INTO user_name (id, first_name) SELECT DISTINCT id, first_name FROM "user" LIMIT 5000;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Insert on user_name (cost=79726.06..79894.42 rows=0 width=0) (actual time=623.308..623.316 rows=0 loops=1)
Buffers: shared hit=29696, local hit=14822 read=3 dirtied=63 written=106, temp written=9212
-> Limit (cost=79726.06..79844.42 rows=5000 width=25) (actual time=611.137..613.584 rows=5000 loops=1)
Buffers: shared hit=29687, temp written=9212
-> HashAggregate (cost=79726.06..103397.94 rows=1000000 width=25) (actual time=611.135..612.987 rows=5000 loops=1)
Group Key: "user".id, "user".first_name
Planned Partitions: 8 Batches: 9 Memory Usage: 16465kB Disk Usage: 40200kB
Buffers: shared hit=29687, temp written=9212
-> Seq Scan on "user" (cost=0.00..39687.00 rows=1000000 width=25) (actual time=0.015..82.246 rows=1000000 loops=1)
Buffers: shared hit=29687
Planning:
Buffers: shared hit=5
Planning Time: 0.303 ms
Execution Time: 630.084 ms
(14 rows)
Time: 634.523 ms
Since the PK of the table is now non-k-sortable (UUIDv4), the INSERT
statements are going to go to random pages. When
these tables are JOINed
on the original integer ID, this results in a mis-match.
mysql> SELECT u.id_old, u.id, u.first_name, un.int_id, un.id, un.first_name FROM user u JOIN user_name un ON u.id_old = un.int_id LIMIT 10;
+--------+--------------------------------------+------------+--------+--------------------------------------+------------+
| id_old | id | first_name | int_id | id | first_name |
+--------+--------------------------------------+------------+--------+--------------------------------------+------------+
| 1771 | 00134aea-213d-424f-9690-c9885cbeaef0 | Merilee | 1771 | 00789777-6649-49a4-b971-f51e3318d926 | Jessa |
| 1746 | 00285053-faca-42a5-9815-ef66a2418054 | Richmound | 1746 | 00771e86-c10a-4b41-808c-f70751483783 | Rebekkah |
| 1234 | 0056de3c-93da-4902-9a00-def17b5ec984 | Carmel | 1234 | 0054f3ac-e65d-4ac1-a000-dc4af8eb0690 | Madelle |
| 2501 | 006f816a-ae63-482a-b54a-ce7eae0c23e6 | Rubin | 2501 | 00a7dc16-00de-4ee7-bfc0-f6dad1e321ac | Renell |
| 1537 | 007945f8-ce69-4271-947a-5f2bc23e4b08 | Ilka | 1537 | 006ac523-c311-4691-8f3f-414d8f4ec2c4 | Coreen |
| 598 | 007ea14c-f531-4e72-b299-df12e3b964a1 | Maurise | 598 | 0027de56-0c57-462c-955b-86fd6dfee7a8 | Reece |
| 993 | 0099d1d2-7c93-4538-b543-c92f363a12be | Koralle | 993 | 0043c94f-1b39-4fee-99c2-e54e60d3e947 | Tarra |
| 1399 | 00a9d76c-4952-4964-9b1b-3cc5b27a0d68 | Josefa | 1399 | 0060b418-9bf9-4c1e-aee0-33ffc32350da | Maddie |
| 823 | 012672cd-f1ee-43de-bf57-ef82e94b452c | Joachim | 823 | 0037da11-b5c5-4106-a5c4-446049c1b9ba | Ezekiel |
| 303 | 013b8e69-a28b-408b-9065-51018b1f8e81 | Aloisia | 303 | 00132006-ce3e-4ddf-ae02-789a8676d39f | Ellwood |
+--------+--------------------------------------+------------+--------+--------------------------------------+------------+
10 rows in set (0.00 sec)
Postgres doesn't cluster on PKs, but the end effect is the same.
postgres=# SELECT u.id_old, u.id, u.first_name, un.int_id, un.id, un.first_name FROM "user" u JOIN user_name un ON u.id_old = un.int_id LIMIT 10;
id_old | id | first_name | int_id | id | first_name
--------+--------------------------------------+---------------+--------+--------------------------------------+--------------
30 | 1754efd1-4281-49af-b97d-ad3af586c9a0 | 'Harrison' | 30 | 875e381d-09b7-488a-9c43-a04e0a0b6c98 | 'Susi'
31 | 0b2dda7c-ea91-4c16-8b17-d3360cf6372b | 'Joanne' | 31 | de93b4a7-2f82-4926-ac7e-77e1d82dab96 | 'Beulah'
32 | c2269eb6-a3f9-4d0e-b2c3-eb02f3bd665f | 'Abdul' | 32 | ba4d5bba-6ae6-4059-a25d-dd7dbcd6de57 | 'Beauregard'
33 | 095d1bea-85c0-4249-a27a-9d746e8f29da | 'Almire' | 33 | 7133a496-dc1e-48c7-8510-84a7a0ed7ec9 | 'Janela'
34 | d64f6e46-5e91-4700-9db5-42cfd1b4e4d3 | 'Pansie' | 34 | a7905df1-a425-42df-8ea5-9ef0c2e5920c | 'Maudie'
35 | 8d15b385-af9f-4e59-9cbb-45a221cf4201 | 'Claus' | 35 | 1e461efe-d66d-4054-81e4-38701a1f2fbf | 'Kliment'
36 | 0504523d-f318-445f-b3e0-80e1b639c6ec | 'Cairistiona' | 36 | 3601ee73-aa81-492f-91aa-a0a29c488f35 | 'Randene'
37 | f8d756b8-b10e-4ed4-a5c9-8c58a15e8cb3 | 'Derrik' | 37 | 38dec2f5-b6b2-4868-847d-df4856fd7c4a | 'Gayelord'
38 | f7602ff4-2eac-495a-94f9-449222196523 | 'Roderich' | 38 | b74d51b8-cdd5-4c60-abd3-2d9852cea92a | 'Ad'
39 | 85da0655-661a-4ce2-b7dd-bd7405fda53e | 'Rubia' | 39 | 15037856-26b0-49fb-a848-aac60eef799c | 'Adam'
(10 rows)
Time: 1.912 ms