Last active
August 22, 2022 05:25
-
-
Save reata/fd5e8cc15f55c3ec1ba0a3d7511d7991 to your computer and use it in GitHub Desktop.
SELECT DISTINCT和GROUP BY的去重效果相同吗?
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/* | |
SELECT DISTINCT | |
*/ | |
WITH foo (pk1, pk2) AS | |
(SELECT 1, 'a' | |
UNION ALL | |
SELECT 1, 'b' | |
UNION ALL | |
SELECT 2, 'a' | |
UNION ALL | |
SELECT 2, NULL) | |
SELECT DISTINCT pk1, pk2 | |
FROM foo; | |
-- 1,a | |
-- 1,b | |
-- 2,a | |
-- 2,<null> | |
/* | |
GROUP BY | |
*/ | |
WITH foo (pk1, pk2) AS | |
(SELECT 1, 'a' | |
UNION ALL | |
SELECT 1, 'b' | |
UNION ALL | |
SELECT 2, 'a' | |
UNION ALL | |
SELECT 2, NULL) | |
SELECT pk1, pk2 | |
FROM foo | |
GROUP BY 1, 2; | |
-- 1,a | |
-- 1,b | |
-- 2,a | |
-- 2,<null> | |
/* | |
SELECT COUNT(DISTINCT), the NULL row won't count | |
*/ | |
WITH foo (pk1, pk2) AS | |
(SELECT 1, 'a' | |
UNION ALL | |
SELECT 1, 'b' | |
UNION ALL | |
SELECT 2, 'a' | |
UNION ALL | |
SELECT 2, NULL) | |
SELECT count(*), count(DISTINCT pk1, pk2), count(DISTINCT pk2) | |
FROM foo; | |
-- 4,3,2 | |
/* | |
SELECT COUNT(*) FROM subquery with GROUP BY, | |
*/ | |
WITH foo (pk1, pk2) AS | |
(SELECT 1, 'a' | |
UNION ALL | |
SELECT 1, 'b' | |
UNION ALL | |
SELECT 2, 'a' | |
UNION ALL | |
SELECT 2, NULL) | |
SELECT count(*) | |
FROM (SELECT pk1, pk2 | |
FROM foo | |
GROUP BY 1, 2) as p; | |
-- 4 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
SELECT DISTINCT和GROUP BY的去重效果是等价的。
需要特别注意的是,如果要对去重的字段组合做计数操作,比如我想通过比较COUNT(*)和COUNT(DISTINCT pk1, pk2)相等来确定表的业务主键pk1, pk2没有重复。此时要特别注意,是否允许业务主键为NULL?在我们的例子中,pk2是可以为NULL的,这时COUNT(DISTINCT ...)不会计入任意一个字段为NULL的行。