This is an overview lesson that covers all of the syntax needed to find and sort records in a database before getting into the GROUP BY clause.
Think of these exercises as covering database basics before getting into the aggregation functionality of GROUP BY. It's just like wanting to cover the many features of spreadsheets before getting into pivot tables and aggregation.
This exercise set has 6 problems, not including two example problems (i.e. 1x and 4x).
Email me at dun@stanford.edu with the subject:
padjo-2017 homework **sql-ex-1** **your_sunet_id**
With a Google Doc (attached or linked-to) in which each query has its own page.
The file to use, as before, is this (zipped) SQLite file of SFPD incidents from 2012 through 2016:
http://stash.padjo.org/data/sfpd-incidents-2012-2016.sqlite.zip
From the 2014 syllabus (more complete, maybe not as concise):
- The SELECT statement and FROM clause
- Limiting and ordering SQL results
- Conditional filtering with WHERE
- Fuzzy matching in SQL
From the 2015 syllabus -- more broken down than 2014, but might have some incomplete spots:
- Getting started with SQLite Browser
- SELECTing rows FROM data tables
- LIMIT and ORDER BY in SQL queries
- Using the WHERE clause to filter data in SQL
- More Boolean Expressions to Filter SQL Queries
- Using LIKE, IN, BETWEEN, and wildcards to match multiple values in SQL
- Functions for transforming text and numbers in SQL
- Aliasing Columns and Tables in SQL Databases
It is an overwhelming number of concepts and syntax, but mastery isn't expected, just familiarity for now.
"Very first incident" in the database basically means, whatever the first row is in the data table without an additional sorting of our own. Use a LIMIT clause to make sure the results only include a single row.
Related:
datetime |
---|
2012-01-01 00:01:00 |
If this is your first SQL of the day, you can try these queries first:
-
Select 'hello'
SELECT 'hello';
-
Select 'hello' for every row that is in "incidents"
SELECT 'hello' FROM incidents;
-
Select all the columns for all the rows in "incidents" (may be a lot for computer to process)
SELECT * FROM incidents;
-
Select all the columns for the first 5 rows
SELECT * FROM incidents; LIMIT 5;
SELECT datetime
FROM incidents
LIMIT 1;
Many of the columns in this database aren't needed for the exercises. So write a query that pulls only these fields, in this order:
- incident_number
- datetime
- category
- description
- resolution
- pd_district
incident_number | datetime | category | description | resolution | pd_district |
---|---|---|---|---|---|
130,261,396 | 2012-01-01 00:01:00 | OTHER OFFENSES | FALSE PERSONATION | BAYVIEW |
SELECT
incident_number,
datetime,
category,
description,
resolution,
pd_district
FROM incidents
LIMIT 1;
Related:
incident_number | datetime | category | description | resolution | pd_district |
---|---|---|---|---|---|
170000198 | 2016-12-31 23:58:00 | DRIVING UNDER THE INFLUENCE | DRIVING WHILE UNDER THE INFLUENCE OF DRUGS | ARREST, BOOKED | MISSION |
{:.table-sql} |
SELECT
incident_number,
datetime,
category,
description,
resolution,
pd_district
FROM
incidents
ORDER BY
datetime DESC
LIMIT 1;
Related:
incident_number | datetime | category | description | resolution | pd_district |
---|---|---|---|---|---|
120001613 | 2012-01-01 02:30:00 | VEHICLE THEFT | STOLEN AUTOMOBILE | INGLESIDE | |
120002257 | 2012-01-01 03:00:00 | VEHICLE THEFT | STOLEN AUTOMOBILE | SOUTHERN | |
120135751 | 2012-01-01 08:00:00 | VEHICLE THEFT | STOLEN MOTORCYCLE | INGLESIDE | |
120003368 | 2012-01-01 09:00:00 | VEHICLE THEFT | STOLEN AUTOMOBILE | INGLESIDE | |
120498349 | 2012-01-01 12:00:00 | VEHICLE THEFT | STOLEN MOTORCYCLE | INGLESIDE |
SELECT
incident_number,
datetime,
category,
description,
resolution,
pd_district
FROM
incidents
WHERE
category = 'VEHICLE THEFT'
ORDER BY
datetime ASC
LIMIT 5;
Related:
- Conditional filtering with WHERE
- USING the WHERE clause to filter data in SQL
- Using LIKE, IN, BETWEEN, and wildcards to match multiple values in SQL
incident_number | datetime | category | description | resolution | pd_district |
---|---|---|---|---|---|
120002837 | 2012-01-01 23:05:00 | VANDALISM | MALICIOUS MISCHIEF, VANDALISM OF VEHICLES | INGLESIDE | |
{:.table-sql} |
The key to this exercise is to realize that the "datetime" field is a text field, as far as SQLite is concerned. Which means we can use text functions to extract dates and do comparisons.
Try:
SELECT datetime
FROM incidents
LIMIT 5;
Then:
SELECT datetime,
SUBSTR(datetime, 1, 4)
FROM incidents
LIMIT 5;
datetime | SUBSTR(datetime, 1, 4) |
---|---|
2012-01-01 00:01:00 | 2012 |
2012-01-01 00:01:00 | 2012 |
2012-01-01 00:01:00 | 2012 |
2012-01-01 00:01:00 | 2012 |
2012-01-01 00:01:00 | 2012 |
And whatever you'd like:
SELECT datetime,
SUBSTR(datetime, 1, 4),
SUBSTR(datetime, 1, 7)
FROM incidents
LIMIT 5;
datetime | SUBSTR(datetime, 1, 4) | SUBSTR(datetime, 1, 7) |
---|---|---|
2012-01-01 00:01:00 | 2012 | 2012-01 |
2012-01-01 00:01:00 | 2012 | 2012-01 |
2012-01-01 00:01:00 | 2012 | 2012-01 |
2012-01-01 00:01:00 | 2012 | 2012-01 |
2012-01-01 00:01:00 | 2012 | 2012-01 |
{:.table-sql} |
What might be hard to figure out is how to set the condition to limit records in which the "datetime" value (which is not just a date but a time) is "equal" to the first day of January 2012.
This won't work:
SELECT
datetime
FROM incidents
WHERE datetime = '2012-01-01'
LIMIT 1;
Though using comparisons might work:
SELECT
datetime
FROM incidents
WHERE datetime > '2012-01-01'
AND datetime < '2012-01-02'
LIMIT 1;
There's all kind of variety:
SELECT
datetime
FROM incidents
WHERE datetime
BETWEEN '2012-01-01' AND '2012-01-02'
LIMIT 1;
SELECT
incident_number,
datetime,
category,
description,
resolution,
pd_district
FROM
incidents
WHERE
SUBSTR(datetime, 1, 10) = '2012-01-01'
ORDER BY
datetime DESC
LIMIT 1;
incident_number | datetime | category | description | resolution | pd_district |
---|---|---|---|---|---|
140592131 | 2014-01-01 00:01:00 | SEX OFFENSES, FORCIBLE | SODOMY | ARREST, BOOKED | MISSION |
140882346 | 2014-01-01 00:01:00 | FRAUD | FRAUDULENT USE OF AUTOMATED TELLER CARD | NORTHERN | |
140309936 | 2014-01-01 00:01:00 | OTHER OFFENSES | FALSE PERSONATION TO RECEIVE MONEY OR PROPERTY | UNFOUNDED | RICHMOND |
140471381 | 2014-01-01 00:01:00 | FRAUD | CREDIT CARD, THEFT BY USE OF | CENTRAL | |
140057743 | 2014-01-01 00:01:00 | SUSPICIOUS OCC | SUSPICIOUS OCCURRENCE | NORTHERN |
SELECT
incident_number,
datetime,
category,
description,
resolution,
pd_district
FROM
incidents
WHERE
SUBSTR(datetime, 1, 4) = '2014'
ORDER BY
datetime ASC
LIMIT 5;
incident_number | datetime | category | description | resolution | pd_district |
---|---|---|---|---|---|
120005342 | 2012-01-03 06:42:00 | VEHICLE THEFT | ATTEMPTED STOLEN VEHICLE | ARREST, BOOKED | MISSION |
120011355 | 2012-01-04 21:00:00 | VEHICLE THEFT | STOLEN AUTOMOBILE | UNFOUNDED | SOUTHERN |
120016656 | 2012-01-05 16:00:00 | VEHICLE THEFT | STOLEN AUTOMOBILE | ARREST, BOOKED | RICHMOND |
120017604 | 2012-01-06 10:30:00 | VEHICLE THEFT | STOLEN TRUCK | ARREST, BOOKED | INGLESIDE |
120017165 | 2012-01-06 21:45:00 | VEHICLE THEFT | STOLEN AUTOMOBILE | UNFOUNDED | INGLESIDE |
SELECT
incident_number,
datetime,
category,
description,
resolution,
pd_district
FROM
incidents
WHERE
category = 'VEHICLE THEFT'
AND resolution != 'NONE'
ORDER BY
datetime ASC
LIMIT 5;
Related: Using LIKE, IN, BETWEEN, and wildcards to match multiple values in SQL
incident_number | datetime | category | description | resolution | pd_district |
---|---|---|---|---|---|
141098374 | 2014-12-30 18:55:00 | DRUG/NARCOTIC | POSSESSION OF BASE/ROCK COCAINE FOR SALE | ARREST, BOOKED | TENDERLOIN |
141096942 | 2014-12-30 11:30:00 | DRUG/NARCOTIC | POSSESSION OF BASE/ROCK COCAINE FOR SALE | ARREST, BOOKED | SOUTHERN |
141094366 | 2014-12-29 15:10:00 | DRUG/NARCOTIC | POSSESSION OF BASE/ROCK COCAINE | TENDERLOIN | |
141094366 | 2014-12-29 15:10:00 | DRUG/NARCOTIC | SALE OF BASE/ROCK COCAINE | TENDERLOIN | |
141092928 | 2014-12-29 03:15:00 | DRUG/NARCOTIC | POSSESSION OF COCAINE | INGLESIDE |
SELECT
incident_number,
datetime,
category,
description,
resolution,
pd_district
FROM
incidents
WHERE
description LIKE '%cocaine%'
AND SUBSTR(datetime, 1, 4) = '2014'
ORDER BY
datetime DESC
LIMIT 5;
7. Find the last 5 (chronological) "DRUG/NARCOTIC" crimes of the year 2014 that was not specifically about cocaine
Assume that "not specifically about cocaine" means that the word 'cocaine' did not show up in the description.
incident_number | datetime | category | description | resolution | pd_district |
---|---|---|---|---|---|
141101610 | 2014-12-31 21:51:00 | DRUG/NARCOTIC | POSSESSION OF METH-AMPHETAMINE | ARREST, BOOKED | RICHMOND |
141101444 | 2014-12-31 20:37:00 | DRUG/NARCOTIC | POSSESSION OF MARIJUANA | ARREST, BOOKED | MISSION |
141099429 | 2014-12-31 08:08:00 | DRUG/NARCOTIC | POSSESSION OF CONTROLLED SUBSTANCE | ARREST, BOOKED | SOUTHERN |
141099429 | 2014-12-31 08:08:00 | DRUG/NARCOTIC | POSSESSION OF NARCOTICS PARAPHERNALIA | ARREST, BOOKED | SOUTHERN |
141099350 | 2014-12-31 03:16:00 | DRUG/NARCOTIC | POSSESSION OF MARIJUANA | TARAVAL |
SELECT
incident_number,
datetime,
category,
description,
resolution,
pd_district
FROM
incidents
WHERE
description NOT LIKE '%cocaine%'
AND category = 'DRUG/NARCOTIC'
AND SUBSTR(datetime, 1, 4) = '2014'
ORDER BY
datetime DESC
LIMIT 5;