SQL practice with SFPD Incidents; Select/Sort/Filter (1 of 4)
Introduction
This is an overview lesson that covers all of the syntax needed to find and sort records in a database before getting into the GROUP BY clause.
Think of these exercises as covering database basics before getting into the aggregation functionality of GROUP BY. It's just like wanting to cover the many features of spreadsheets before getting into pivot tables and aggregation.
Directions
This exercise set has 6 problems, not including two example problems (i.e. 1x and 4x).
Email me at dun@stanford.edu with the subject:
padjo-2017 homework **sql-ex-1** **your_sunet_id**
With a Google Doc (attached or linked-to) in which each query has its own page.
The file to use, as before, is this (zipped) SQLite file of SFPD incidents from 2012 through 2016:
http://stash.padjo.org/data/sfpd-incidents-2012-2016.sqlite.zip
Related reading
From the 2014 syllabus (more complete, maybe not as concise):
- The SELECT statement and FROM clause
- Limiting and ordering SQL results
- Conditional filtering with WHERE
- Fuzzy matching in SQL
From the 2015 syllabus -- more broken down than 2014, but might have some incomplete spots:
- Getting started with SQLite Browser
- SELECTing rows FROM data tables
- LIMIT and ORDER BY in SQL queries
- Using the WHERE clause to filter data in SQL
- More Boolean Expressions to Filter SQL Queries
- Using LIKE, IN, BETWEEN, and wildcards to match multiple values in SQL
- Functions for transforming text and numbers in SQL
- Aliasing Columns and Tables in SQL Databases
It is an overwhelming number of concepts and syntax, but mastery isn't expected, just familiarity for now.
Exercises
1x. (EXAMPLE) Show just the datetime column from the very first incident in the database
"Very first incident" in the database basically means, whatever the first row is in the data table without an additional sorting of our own. Use a LIMIT clause to make sure the results only include a single row.
Related:
Expected result
datetime |
---|
2012-01-01 00:01:00 |
Hints
If this is your first SQL of the day, you can try these queries first:
-
Select 'hello'
SELECT 'hello';
-
Select 'hello' for every row that is in "incidents"
SELECT 'hello' FROM incidents;
-
Select all the columns for all the rows in "incidents" (may be a lot for computer to process)
SELECT * FROM incidents;
-
Select all the columns for the first 5 rows
SELECT * FROM incidents; LIMIT 5;
Query
SELECT datetime
FROM incidents
LIMIT 1;
1. Find selected columns from the very first incident in the database
Many of the columns in this database aren't needed for the exercises. So write a query that pulls only these fields, in this order:
- incident_number
- datetime
- category
- description
- resolution
- pd_district
Expected result
incident_number | datetime | category | description | resolution | pd_district |
---|---|---|---|---|---|
130,261,396 | 2012-01-01 00:01:00 | OTHER OFFENSES | FALSE PERSONATION | BAYVIEW |
Query
SELECT
incident_number,
datetime,
category,
description,
resolution,
pd_district
FROM incidents
LIMIT 1;
2. Find the very last incident (chronologically) in the database
Related:
Expected results
incident_number | datetime | category | description | resolution | pd_district |
---|---|---|---|---|---|
170000198 | 2016-12-31 23:58:00 | DRIVING UNDER THE INFLUENCE | DRIVING WHILE UNDER THE INFLUENCE OF DRUGS | ARREST, BOOKED | MISSION |
{:.table-sql} |
Query
SELECT
incident_number,
datetime,
category,
description,
resolution,
pd_district
FROM
incidents
ORDER BY
datetime DESC
LIMIT 1;
3. Find the first five crimes (chronologically) categorized as "VEHICLE THEFT"
Related:
Expected results
incident_number | datetime | category | description | resolution | pd_district |
---|---|---|---|---|---|
120001613 | 2012-01-01 02:30:00 | VEHICLE THEFT | STOLEN AUTOMOBILE | INGLESIDE | |
120002257 | 2012-01-01 03:00:00 | VEHICLE THEFT | STOLEN AUTOMOBILE | SOUTHERN | |
120135751 | 2012-01-01 08:00:00 | VEHICLE THEFT | STOLEN MOTORCYCLE | INGLESIDE | |
120003368 | 2012-01-01 09:00:00 | VEHICLE THEFT | STOLEN AUTOMOBILE | INGLESIDE | |
120498349 | 2012-01-01 12:00:00 | VEHICLE THEFT | STOLEN MOTORCYCLE | INGLESIDE |
Query
SELECT
incident_number,
datetime,
category,
description,
resolution,
pd_district
FROM
incidents
WHERE
category = 'VEHICLE THEFT'
ORDER BY
datetime ASC
LIMIT 5;
4x. (EXAMPLE) Find the last incident that happened on the first day of 2012
Related:
- Conditional filtering with WHERE
- USING the WHERE clause to filter data in SQL
- Using LIKE, IN, BETWEEN, and wildcards to match multiple values in SQL
Expected results
incident_number | datetime | category | description | resolution | pd_district |
---|---|---|---|---|---|
120002837 | 2012-01-01 23:05:00 | VANDALISM | MALICIOUS MISCHIEF, VANDALISM OF VEHICLES | INGLESIDE | |
{:.table-sql} |
Hints
The key to this exercise is to realize that the "datetime" field is a text field, as far as SQLite is concerned. Which means we can use text functions to extract dates and do comparisons.
Try:
SELECT datetime
FROM incidents
LIMIT 5;
Then:
SELECT datetime,
SUBSTR(datetime, 1, 4)
FROM incidents
LIMIT 5;
datetime | SUBSTR(datetime, 1, 4) |
---|---|
2012-01-01 00:01:00 | 2012 |
2012-01-01 00:01:00 | 2012 |
2012-01-01 00:01:00 | 2012 |
2012-01-01 00:01:00 | 2012 |
2012-01-01 00:01:00 | 2012 |
And whatever you'd like:
SELECT datetime,
SUBSTR(datetime, 1, 4),
SUBSTR(datetime, 1, 7)
FROM incidents
LIMIT 5;
datetime | SUBSTR(datetime, 1, 4) | SUBSTR(datetime, 1, 7) |
---|---|---|
2012-01-01 00:01:00 | 2012 | 2012-01 |
2012-01-01 00:01:00 | 2012 | 2012-01 |
2012-01-01 00:01:00 | 2012 | 2012-01 |
2012-01-01 00:01:00 | 2012 | 2012-01 |
2012-01-01 00:01:00 | 2012 | 2012-01 |
{:.table-sql} |
What might be hard to figure out is how to set the condition to limit records in which the "datetime" value (which is not just a date but a time) is "equal" to the first day of January 2012.
This won't work:
SELECT
datetime
FROM incidents
WHERE datetime = '2012-01-01'
LIMIT 1;
Though using comparisons might work:
SELECT
datetime
FROM incidents
WHERE datetime > '2012-01-01'
AND datetime < '2012-01-02'
LIMIT 1;
There's all kind of variety:
SELECT
datetime
FROM incidents
WHERE datetime
BETWEEN '2012-01-01' AND '2012-01-02'
LIMIT 1;
Query
SELECT
incident_number,
datetime,
category,
description,
resolution,
pd_district
FROM
incidents
WHERE
SUBSTR(datetime, 1, 10) = '2012-01-01'
ORDER BY
datetime DESC
LIMIT 1;
4. Find the first 5 incidents of the year 2014
Expected results
incident_number | datetime | category | description | resolution | pd_district |
---|---|---|---|---|---|
140592131 | 2014-01-01 00:01:00 | SEX OFFENSES, FORCIBLE | SODOMY | ARREST, BOOKED | MISSION |
140882346 | 2014-01-01 00:01:00 | FRAUD | FRAUDULENT USE OF AUTOMATED TELLER CARD | NORTHERN | |
140309936 | 2014-01-01 00:01:00 | OTHER OFFENSES | FALSE PERSONATION TO RECEIVE MONEY OR PROPERTY | UNFOUNDED | RICHMOND |
140471381 | 2014-01-01 00:01:00 | FRAUD | CREDIT CARD, THEFT BY USE OF | CENTRAL | |
140057743 | 2014-01-01 00:01:00 | SUSPICIOUS OCC | SUSPICIOUS OCCURRENCE | NORTHERN |
Query
SELECT
incident_number,
datetime,
category,
description,
resolution,
pd_district
FROM
incidents
WHERE
SUBSTR(datetime, 1, 4) = '2014'
ORDER BY
datetime ASC
LIMIT 5;
5. Find the first 5 chronological vehicle thefts that had a resolution other than "NONE"
Expected results
incident_number | datetime | category | description | resolution | pd_district |
---|---|---|---|---|---|
120005342 | 2012-01-03 06:42:00 | VEHICLE THEFT | ATTEMPTED STOLEN VEHICLE | ARREST, BOOKED | MISSION |
120011355 | 2012-01-04 21:00:00 | VEHICLE THEFT | STOLEN AUTOMOBILE | UNFOUNDED | SOUTHERN |
120016656 | 2012-01-05 16:00:00 | VEHICLE THEFT | STOLEN AUTOMOBILE | ARREST, BOOKED | RICHMOND |
120017604 | 2012-01-06 10:30:00 | VEHICLE THEFT | STOLEN TRUCK | ARREST, BOOKED | INGLESIDE |
120017165 | 2012-01-06 21:45:00 | VEHICLE THEFT | STOLEN AUTOMOBILE | UNFOUNDED | INGLESIDE |
Query
SELECT
incident_number,
datetime,
category,
description,
resolution,
pd_district
FROM
incidents
WHERE
category = 'VEHICLE THEFT'
AND resolution != 'NONE'
ORDER BY
datetime ASC
LIMIT 5;
6. Find the last 5 incidents of the year 2014 that had something to do with cocaine
Related: Using LIKE, IN, BETWEEN, and wildcards to match multiple values in SQL
Expected Results
incident_number | datetime | category | description | resolution | pd_district |
---|---|---|---|---|---|
141098374 | 2014-12-30 18:55:00 | DRUG/NARCOTIC | POSSESSION OF BASE/ROCK COCAINE FOR SALE | ARREST, BOOKED | TENDERLOIN |
141096942 | 2014-12-30 11:30:00 | DRUG/NARCOTIC | POSSESSION OF BASE/ROCK COCAINE FOR SALE | ARREST, BOOKED | SOUTHERN |
141094366 | 2014-12-29 15:10:00 | DRUG/NARCOTIC | POSSESSION OF BASE/ROCK COCAINE | TENDERLOIN | |
141094366 | 2014-12-29 15:10:00 | DRUG/NARCOTIC | SALE OF BASE/ROCK COCAINE | TENDERLOIN | |
141092928 | 2014-12-29 03:15:00 | DRUG/NARCOTIC | POSSESSION OF COCAINE | INGLESIDE |
Query
SELECT
incident_number,
datetime,
category,
description,
resolution,
pd_district
FROM
incidents
WHERE
description LIKE '%cocaine%'
AND SUBSTR(datetime, 1, 4) = '2014'
ORDER BY
datetime DESC
LIMIT 5;
7. Find the last 5 (chronological) "DRUG/NARCOTIC" crimes of the year 2014 that was not specifically about cocaine
Assume that "not specifically about cocaine" means that the word 'cocaine' did not show up in the description.
Expected Results
incident_number | datetime | category | description | resolution | pd_district |
---|---|---|---|---|---|
141101610 | 2014-12-31 21:51:00 | DRUG/NARCOTIC | POSSESSION OF METH-AMPHETAMINE | ARREST, BOOKED | RICHMOND |
141101444 | 2014-12-31 20:37:00 | DRUG/NARCOTIC | POSSESSION OF MARIJUANA | ARREST, BOOKED | MISSION |
141099429 | 2014-12-31 08:08:00 | DRUG/NARCOTIC | POSSESSION OF CONTROLLED SUBSTANCE | ARREST, BOOKED | SOUTHERN |
141099429 | 2014-12-31 08:08:00 | DRUG/NARCOTIC | POSSESSION OF NARCOTICS PARAPHERNALIA | ARREST, BOOKED | SOUTHERN |
141099350 | 2014-12-31 03:16:00 | DRUG/NARCOTIC | POSSESSION OF MARIJUANA | TARAVAL |
Query
SELECT
incident_number,
datetime,
category,
description,
resolution,
pd_district
FROM
incidents
WHERE
description NOT LIKE '%cocaine%'
AND category = 'DRUG/NARCOTIC'
AND SUBSTR(datetime, 1, 4) = '2014'
ORDER BY
datetime DESC
LIMIT 5;