Skip to content

Instantly share code, notes, and snippets.

@zsiv
Created February 29, 2024 19:33
Show Gist options
  • Save zsiv/bffea1a7709076ce076e6017bd603331 to your computer and use it in GitHub Desktop.
Save zsiv/bffea1a7709076ce076e6017bd603331 to your computer and use it in GitHub Desktop.
BrentOzar.com Code Example for 2/29/24
/*
Top 10 "helpful" locations
Assumptions:
- > 100 users in location
- no null/blank location
- most helpful calculated by highest overall average answer score (subjective)
*/
WITH AnswerData AS (
SELECT u.Id, u.[Location], AVG(p.Score) AS 'AvgAnswerScore'
FROM dbo.Users u INNER JOIN dbo.Posts p ON u.Id = p.OwnerUserId
WHERE p.PostTypeID = 2
AND u.[Location] IS NOT NULL
AND u.[Location] <> ''
GROUP BY u.Id, u.[Location]
)
SELECT TOP 10 [Location], COUNT(*) AS [Population], AVG(AvgAnswerScore) AS AvgAnswerRep
FROM AnswerData
GROUP BY [Location]
HAVING COUNT(*) >= 100
ORDER BY AvgAnswerRep DESC
/*
Top 10 "help needed" locations
Assumptions:
- > 100 users in location
- no null/blank location
- most help needed calculated by subtracting the total answers given in a location from the number of questions asked
(subjective, case could be made to average over individual user deltas?)
*/
WITH HelpData AS (
SELECT u.Id, u.[Location], COUNT(CASE WHEN p.PostTypeId = 1 THEN 1 END) AS QuestionCount,
COUNT(CASE WHEN p.PostTypeId = 2 THEN 1 END) AS AnswerCount
FROM dbo.Users u INNER JOIN dbo.Posts p ON u.Id = p.OwnerUserId
WHERE u.[Location] IS NOT NULL
AND u.[Location] <> ''
GROUP BY u.Id, u.[Location]
)
SELECT TOP 10 [Location], COUNT(*) AS [Population], (SUM(QuestionCount) - SUM(AnswerCount)) AS AnswerDelta
FROM HelpData
GROUP BY [Location]
HAVING COUNT(*) >= 100
ORDER BY AnswerDelta DESC
@zsiv
Copy link
Author

zsiv commented Mar 2, 2024

@ahlstart Just finished reading up on Simpson's Paradox - interesting stuff. I guess it comes down to something as simple as an average (or average of averages in this case) potentially losing nuance of meaning in the results when all the other relevant statistical forces at play aren't considered. Maybe this is why I'm a DBA and not a mathematician :)

@ahlstart
Copy link

ahlstart commented Mar 2, 2024

@zsiv Here's hoping you can some day point out this issue as the cause why one of your users' queries behaves in ever so slightly unexpected ways.
Both our intuitions were to approach this from the point of individual user. And AVG(AVG()) pleased Brent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment