SQL practice with SFPD incidents and simple aggregation (2 of 4)
These are the answers to:
This exercise set has 5 problems, not including two example problems (i.e. 1x and 5x).
Email me at email@example.com with the subject:
padjo-2017 homework **sql-ex-2** **your_sunet_id**
With a Google Doc (attached or linked-to) in which each query has its own page.
The file to use, as before, is this (zipped) SQLite file of SFPD incidents from 2012 through 2016:
The SFPD data doesn't lend itself to much quantification, as the latitude and longitude columns are the only numerical columns, thus sum, average, max/min aren't very interesting aggregate methods.
So for this exercise, we just care about the COUNT function. When COUNT, or any aggregate function, is used without a GROUP BY clause, the result is a single value.
Pretend you had a spreadsheet and you used the
=SUM() function across every row, i.e.
=SUM(A1:A9999). You would expect the function to deliver a single value.
The spreadsheet equivalent of "GROUP BY" is to do things using pivot tables. But for these SQL exercises, it's a singular count (i.e. not a pivot table, i.e. not using GROUP BY).
Besides the COUNT() function in the SELECT clause, all the other SQL syntax remains the same.
Check out this past lesson, Aggregate functions in SQL. Specifically, focus on the The COUNT section and especially, the Counting without grouping section. (Ignore the GROUP BY stuff for now)
1x. (EXAMPLE) Just do a count
The syntax for calculating the count of rows in a result set is simply:
Believe it or not, even without reference to the database table (i.e.
incidents), this has a valid result:
This can be interpreted as something like, The number of rows in the result for our useless queries is: 1.
1. Total number of incidents
SELECT COUNT(*) with a
FROM clause will count the total number of rows in the table.
The expected answer expects a column name of total_count, so your query should include the proper alias clause.
2. Total number of incidents categorized as "ASSAULT"
Same as previous question, except there is a WHERE clause and a different alias.
(I'm just giving you this answer)
SELECT COUNT(*) AS total_assaults FROM incidents WHERE category = 'ASSAULT';
3. Total number of incidents having something to do with marijuana
Assume "having something to do with marijuana" means that the literal text 'marijuana' shows up in the "description" field, i.e. you will want to use the LIKE and wildcards.
Read: Fuzzy matching in SQL, particularly Matching patterns with LIKE
4. Total number of incidents taking place in 2014
5x. (EXAMPLE) Total number of incidents that took place in the "TENDERLOIN" district that were either "UNFOUNDED" or "NOT PROSECUTED"
Nothing different from the previous exercises, just a slightly more complicated logic in the WHERE clause.
It is very easy to mix up the logic here. For example, this is NOT the logic that we want for finding both kinds of "resolution":
resolution = 'UNFOUNDED' AND resolution = 'NOT PROSECUTED'
That is mutually exclusive (do the logic in your head). We want to use OR:
resolution = 'UNFOUNDED' OR resolution = 'NOT PROSECUTED'
Now look at the
"pd_district" = 'TENDERLOIN' logic separately:
pd_district = 'TENDERLOIN'
How do we combine the two conditional clauses?
(the exerpt below is obviously just an excerpt, not valid syntax)
pd_district = 'TENDERLOIN' AND resolution = 'UNFOUNDED' OR resolution = 'NOT PROSECUTED'
The above logic returns results in which:
- The district is "TENDERLOIN" and the resolution is "UNFOUNDED"
- The resolution is "NOT PROSECUTED"
The results of #2 include rows from districts that are not the TENDERLOIND
Using parentheses to group logic
Using parentheses, as we do in arithmetic, we can enforce the order of the logic:
pd_district = 'TENDERLOIN' AND (resolution = 'UNFOUNDED' OR resolution = 'NOT PROSECUTED')
Unlike the previous bad example, there are only 2 separate conditions that must both be met. The latter condition, dealing with the resolution, can have multiple choices. But in the end, all results are in the 'TENDERLOIN' district.
SELECT COUNT(*) AS total_count FROM incidents WHERE pd_district = 'TENDERLOIN' AND (resolution = 'UNFOUNDED' OR resolution = 'NOT PROSECUTED');
5. Total number of incidents taking place in 2016 in either the "TENDERLOIN" or "MISSION" districts
SELECT COUNT(*) AS total_count FROM incidents WHERE SUBSTR(datetime, 1, 4) = '2016' AND pd_district IN ('TENDERLOIN', 'MISSION');