Skip to content

Instantly share code, notes, and snippets.

@reata
Last active August 22, 2022 05:25
Show Gist options
  • Save reata/fd5e8cc15f55c3ec1ba0a3d7511d7991 to your computer and use it in GitHub Desktop.
Save reata/fd5e8cc15f55c3ec1ba0a3d7511d7991 to your computer and use it in GitHub Desktop.
SELECT DISTINCT和GROUP BY的去重效果相同吗?
/*
SELECT DISTINCT
*/
WITH foo (pk1, pk2) AS
(SELECT 1, 'a'
UNION ALL
SELECT 1, 'b'
UNION ALL
SELECT 2, 'a'
UNION ALL
SELECT 2, NULL)
SELECT DISTINCT pk1, pk2
FROM foo;
-- 1,a
-- 1,b
-- 2,a
-- 2,<null>
/*
GROUP BY
*/
WITH foo (pk1, pk2) AS
(SELECT 1, 'a'
UNION ALL
SELECT 1, 'b'
UNION ALL
SELECT 2, 'a'
UNION ALL
SELECT 2, NULL)
SELECT pk1, pk2
FROM foo
GROUP BY 1, 2;
-- 1,a
-- 1,b
-- 2,a
-- 2,<null>
/*
SELECT COUNT(DISTINCT), the NULL row won't count
*/
WITH foo (pk1, pk2) AS
(SELECT 1, 'a'
UNION ALL
SELECT 1, 'b'
UNION ALL
SELECT 2, 'a'
UNION ALL
SELECT 2, NULL)
SELECT count(*), count(DISTINCT pk1, pk2), count(DISTINCT pk2)
FROM foo;
-- 4,3,2
/*
SELECT COUNT(*) FROM subquery with GROUP BY,
*/
WITH foo (pk1, pk2) AS
(SELECT 1, 'a'
UNION ALL
SELECT 1, 'b'
UNION ALL
SELECT 2, 'a'
UNION ALL
SELECT 2, NULL)
SELECT count(*)
FROM (SELECT pk1, pk2
FROM foo
GROUP BY 1, 2) as p;
-- 4
@reata
Copy link
Author

reata commented Aug 22, 2022

SELECT DISTINCT和GROUP BY的去重效果是等价的。

需要特别注意的是,如果要对去重的字段组合做计数操作,比如我想通过比较COUNT(*)和COUNT(DISTINCT pk1, pk2)相等来确定表的业务主键pk1, pk2没有重复。此时要特别注意,是否允许业务主键为NULL?在我们的例子中,pk2是可以为NULL的,这时COUNT(DISTINCT ...)不会计入任意一个字段为NULL的行。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment