Created
February 29, 2024 19:33
-
-
Save zsiv/bffea1a7709076ce076e6017bd603331 to your computer and use it in GitHub Desktop.
BrentOzar.com Code Example for 2/29/24
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/* | |
Top 10 "helpful" locations | |
Assumptions: | |
- > 100 users in location | |
- no null/blank location | |
- most helpful calculated by highest overall average answer score (subjective) | |
*/ | |
WITH AnswerData AS ( | |
SELECT u.Id, u.[Location], AVG(p.Score) AS 'AvgAnswerScore' | |
FROM dbo.Users u INNER JOIN dbo.Posts p ON u.Id = p.OwnerUserId | |
WHERE p.PostTypeID = 2 | |
AND u.[Location] IS NOT NULL | |
AND u.[Location] <> '' | |
GROUP BY u.Id, u.[Location] | |
) | |
SELECT TOP 10 [Location], COUNT(*) AS [Population], AVG(AvgAnswerScore) AS AvgAnswerRep | |
FROM AnswerData | |
GROUP BY [Location] | |
HAVING COUNT(*) >= 100 | |
ORDER BY AvgAnswerRep DESC | |
/* | |
Top 10 "help needed" locations | |
Assumptions: | |
- > 100 users in location | |
- no null/blank location | |
- most help needed calculated by subtracting the total answers given in a location from the number of questions asked | |
(subjective, case could be made to average over individual user deltas?) | |
*/ | |
WITH HelpData AS ( | |
SELECT u.Id, u.[Location], COUNT(CASE WHEN p.PostTypeId = 1 THEN 1 END) AS QuestionCount, | |
COUNT(CASE WHEN p.PostTypeId = 2 THEN 1 END) AS AnswerCount | |
FROM dbo.Users u INNER JOIN dbo.Posts p ON u.Id = p.OwnerUserId | |
WHERE u.[Location] IS NOT NULL | |
AND u.[Location] <> '' | |
GROUP BY u.Id, u.[Location] | |
) | |
SELECT TOP 10 [Location], COUNT(*) AS [Population], (SUM(QuestionCount) - SUM(AnswerCount)) AS AnswerDelta | |
FROM HelpData | |
GROUP BY [Location] | |
HAVING COUNT(*) >= 100 | |
ORDER BY AnswerDelta DESC |
@zsiv Here's hoping you can some day point out this issue as the cause why one of your users' queries behaves in ever so slightly unexpected ways.
Both our intuitions were to approach this from the point of individual user. And AVG(AVG()) pleased Brent.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
@ahlstart Just finished reading up on Simpson's Paradox - interesting stuff. I guess it comes down to something as simple as an average (or average of averages in this case) potentially losing nuance of meaning in the results when all the other relevant statistical forces at play aren't considered. Maybe this is why I'm a DBA and not a mathematician :)