Skip to content

Instantly share code, notes, and snippets.

@arboledasaa
Created February 29, 2024 19:19
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save arboledasaa/80d360ffdca4ba026344f8a99c808b0d to your computer and use it in GitHub Desktop.
Save arboledasaa/80d360ffdca4ba026344f8a99c808b0d to your computer and use it in GitHub Desktop.
I’m not worried about performance for the sake of this exercise.
--In the Users table, the Location column’s datatype is NVARCHAR(100).
--People can type in whatever they want, which leads to a lot of oddball one-off locations.
--We should filter for a minimum number of people involved, especially since we’re considering paid advertising.
--Let’s filter for only locations with at least 100 people in them.
--There are null and empty locations. Those should probably be excluded.
SET STATISTICS IO ON;
BEGIN TRY
DROP TABLE #locations;
END TRY
BEGIN CATCH SELECT 1 END CATCH;
Select [Location]
into #locations
from [dbo].[Users]
where [Location] is not null
and [Location] <> ''
group by [Location]
having count(*) >=100;
/*
Missing Index Details from SQLQuery6.sql - BVQ647-ZBOOK14G.StackOverflow2013 (DS\BVQ647 (51))
The Query Processor estimates that implementing the following index could improve the query cost by 90.8365%.
*/
/*
USE [StackOverflow2013]
GO
CREATE NONCLUSTERED INDEX Location
ON [dbo].[Users] ([Location])
GO
*/
/*
are we going to keep asking this question? if so maybe create the index.
We are as we are going to keep asking who is source and sink.
before
logical reads 45184
after
logical reads 3030
*/
--Part of your solution is going to involve looking at the Posts table, which stores questions and answers (and other things.) A few columns to consider:
--Posts.PostTypeId = when 1, the Post is a Question. When 2, it’s an Answer.
--Posts.OwnerUserId = the Users.Id who owns the question or answer.
--Posts.Score = the quality of the question or answer.
--For this exercise, I’m not expecting a specific “right” or “wrong” answer – instead, for this one, you’re probably going to have a good time sharing your answer in the comments, and comparing your answer to that of others. Feel free to put your queries in a Github Gist, and include that link in your comments. I’ll circle back in a week and write about different ways to solve it. Have fun!
--They can be separate queries, and I’m not worried about performance for the sake of this exercise.
--First, what are the top 10 Locations populated with users who seem to be really helpful, meaning, they write really good answers?
--maybe this?
select top(10) [dbo].[Users].[Location], avg(Posts.Score) as avgScore
from [dbo].[Posts]
join [dbo].[Users] on [Posts].OwnerUserId = [Users].Id
join #locations on [dbo].[Users].[Location] = #locations.Location
where Posts.PostTypeId = 2
group by [dbo].[Users].[Location]
order by avgScore desc;
--Second, what are the top 10 Locations where people seem to need the most help?
--Meaning, they ask a lot of questions, but they do not seem to be answering those of their neighbors?
with x as(
select [dbo].[Users].[Location],
sum(case when Posts.PostTypeId = 1 then 1 else 0 end) as questions,
sum(case when Posts.PostTypeId = 2 then 1 else 0 end) as answers
from [dbo].[Posts]
join [dbo].[Users] on [Posts].OwnerUserId = [Users].Id
join #locations on [dbo].[Users].[Location] = #locations.Location
group by [dbo].[Users].[Location]
)
select top(10) x.Location
from x
order by questions/(answers+0.1) desc --maybe add weights here?
@BrentOzar
Copy link

Just FYI, here's my edited version of your query to show more columns for sniff testing:

--In the Users table, the Location column’s datatype is NVARCHAR(100). 
--People can type in whatever they want, which leads to a lot of oddball one-off locations. 
--We should filter for a minimum number of people involved, especially since we’re considering paid advertising. 
--Let’s filter for only locations with at least 100 people in them.
--There are null and empty locations. Those should probably be excluded.

SET STATISTICS IO ON;

BEGIN TRY
DROP TABLE #locations;
END TRY
BEGIN CATCH SELECT 1 END CATCH;

Select [Location]
into #locations
from [dbo].[Users]
where [Location] is not null
and [Location] <> ''
group by [Location]
having count(*) >=100;

/*
Missing Index Details from SQLQuery6.sql - BVQ647-ZBOOK14G.StackOverflow2013 (DS\BVQ647 (51))
The Query Processor estimates that implementing the following index could improve the query cost by 90.8365%.
*/

/*
USE [StackOverflow2013]
GO
CREATE NONCLUSTERED INDEX Location
ON [dbo].[Users] ([Location])

GO
*/

/*
are we going to keep asking this question? if so maybe create the index.
We are as we are going to keep asking who is source and sink.
before
logical reads 45184
after
logical reads 3030
*/

--Part of your solution is going to involve looking at the Posts table, which stores questions and answers (and other things.) A few columns to consider:
--Posts.PostTypeId = when 1, the Post is a Question. When 2, it’s an Answer.
--Posts.OwnerUserId = the Users.Id who owns the question or answer.
--Posts.Score = the quality of the question or answer.
--For this exercise, I’m not expecting a specific “right” or “wrong” answer – instead, for this one, you’re probably going to have a good time sharing your answer in the comments, and comparing your answer to that of others. Feel free to put your queries in a Github Gist, and include that link in your comments. I’ll circle back in a week and write about different ways to solve it. Have fun!

--They can be separate queries, and I’m not worried about performance for the sake of this exercise.

--First, what are the top 10 Locations populated with users who seem to be really helpful, meaning, they write really good answers?

--maybe this?
select top(10) [dbo].[Users].[Location], avg(Posts.Score * 1.0) as avgScore,
	COUNT(DISTINCT Users.Id) AS Population, COUNT(DISTINCT Posts.Id) AS Answers
from [dbo].[Posts]
join [dbo].[Users] on [Posts].OwnerUserId = [Users].Id
join #locations on  [dbo].[Users].[Location] = #locations.Location
where Posts.PostTypeId = 2
group by [dbo].[Users].[Location]
order by avgScore desc;

--Second, what are the top 10 Locations where people seem to need the most help? 
--Meaning, they ask a lot of questions, but they do not seem to be answering those of their neighbors?
with x as(
select [dbo].[Users].[Location], 
sum(case when Posts.PostTypeId = 1 then 1 else 0 end) as questions,
sum(case when Posts.PostTypeId = 2 then 1 else 0 end) as answers,
COUNT(DISTINCT Users.id) AS Population
from [dbo].[Posts]
join [dbo].[Users] on [Posts].OwnerUserId = [Users].Id
join #locations on  [dbo].[Users].[Location] = #locations.Location
group by [dbo].[Users].[Location]
)
select top(10) x.*
from x
order by questions/(answers+0.1) desc --maybe add weights here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment