Skip to content

Instantly share code, notes, and snippets.

@dannguyen
Last active October 26, 2017 15:02
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save dannguyen/d06f8fb41722dd777c70acf83b1e03ad to your computer and use it in GitHub Desktop.

SQL practice with SFPD Incidents; Select/Sort/Filter (1 of 4)

Introduction

This is an overview lesson that covers all of the syntax needed to find and sort records in a database before getting into the GROUP BY clause.

Think of these exercises as covering database basics before getting into the aggregation functionality of GROUP BY. It's just like wanting to cover the many features of spreadsheets before getting into pivot tables and aggregation.

Directions

This exercise set has 6 problems, not including two example problems (i.e. 1x and 4x).

Email me at dun@stanford.edu with the subject:

padjo-2017 homework **sql-ex-1** **your_sunet_id**

With a Google Doc (attached or linked-to) in which each query has its own page.

The file to use, as before, is this (zipped) SQLite file of SFPD incidents from 2012 through 2016:

http://stash.padjo.org/data/sfpd-incidents-2012-2016.sqlite.zip

Related reading

From the 2014 syllabus (more complete, maybe not as concise):

From the 2015 syllabus -- more broken down than 2014, but might have some incomplete spots:

It is an overwhelming number of concepts and syntax, but mastery isn't expected, just familiarity for now.


Exercises

1x. (EXAMPLE) Show just the datetime column from the very first incident in the database

"Very first incident" in the database basically means, whatever the first row is in the data table without an additional sorting of our own. Use a LIMIT clause to make sure the results only include a single row.

Related:

Expected result

datetime
2012-01-01 00:01:00

Hints

If this is your first SQL of the day, you can try these queries first:

  • Select 'hello'

    SELECT 'hello';
    
  • Select 'hello' for every row that is in "incidents"

    SELECT 'hello'
    FROM incidents;
    
  • Select all the columns for all the rows in "incidents" (may be a lot for computer to process)

    SELECT *
    FROM incidents;
    
  • Select all the columns for the first 5 rows

    SELECT *
    FROM incidents;
    LIMIT 5;
    

Query

SELECT datetime
FROM incidents
LIMIT 1;

1. Find selected columns from the very first incident in the database

Many of the columns in this database aren't needed for the exercises. So write a query that pulls only these fields, in this order:

  • incident_number
  • datetime
  • category
  • description
  • resolution
  • pd_district

Expected result

incident_number datetime category description resolution pd_district
130,261,396 2012-01-01 00:01:00 OTHER OFFENSES FALSE PERSONATION BAYVIEW

Query

SELECT 
  incident_number,
  datetime,
  category,
  description,
  resolution,
  pd_district
FROM incidents
LIMIT 1;

2. Find the very last incident (chronologically) in the database

Related:

Expected results

incident_number datetime category description resolution pd_district
170000198 2016-12-31 23:58:00 DRIVING UNDER THE INFLUENCE DRIVING WHILE UNDER THE INFLUENCE OF DRUGS ARREST, BOOKED MISSION
{:.table-sql}

Query

SELECT 
  incident_number,
  datetime,
  category,
  description,
  resolution,
  pd_district
FROM 
  incidents
ORDER BY 
  datetime DESC
LIMIT 1;

3. Find the first five crimes (chronologically) categorized as "VEHICLE THEFT"

Related:

Expected results

incident_number datetime category description resolution pd_district
120001613 2012-01-01 02:30:00 VEHICLE THEFT STOLEN AUTOMOBILE INGLESIDE
120002257 2012-01-01 03:00:00 VEHICLE THEFT STOLEN AUTOMOBILE SOUTHERN
120135751 2012-01-01 08:00:00 VEHICLE THEFT STOLEN MOTORCYCLE INGLESIDE
120003368 2012-01-01 09:00:00 VEHICLE THEFT STOLEN AUTOMOBILE INGLESIDE
120498349 2012-01-01 12:00:00 VEHICLE THEFT STOLEN MOTORCYCLE INGLESIDE

Query

SELECT 
  incident_number,
  datetime,
  category,
  description,
  resolution,
  pd_district
FROM 
  incidents
WHERE 
  category = 'VEHICLE THEFT'
ORDER BY 
  datetime ASC
LIMIT 5;

4x. (EXAMPLE) Find the last incident that happened on the first day of 2012

Related:

Expected results

incident_number datetime category description resolution pd_district
120002837 2012-01-01 23:05:00 VANDALISM MALICIOUS MISCHIEF, VANDALISM OF VEHICLES INGLESIDE
{:.table-sql}

Hints

The key to this exercise is to realize that the "datetime" field is a text field, as far as SQLite is concerned. Which means we can use text functions to extract dates and do comparisons.

Try:

SELECT datetime
FROM incidents
LIMIT 5;

Then:

SELECT datetime,
  SUBSTR(datetime, 1, 4)
FROM incidents
LIMIT 5;
datetime SUBSTR(datetime, 1, 4)
2012-01-01 00:01:00 2012
2012-01-01 00:01:00 2012
2012-01-01 00:01:00 2012
2012-01-01 00:01:00 2012
2012-01-01 00:01:00 2012

And whatever you'd like:

SELECT datetime,
  SUBSTR(datetime, 1, 4),
  SUBSTR(datetime, 1, 7)
FROM incidents
LIMIT 5;
datetime SUBSTR(datetime, 1, 4) SUBSTR(datetime, 1, 7)
2012-01-01 00:01:00 2012 2012-01
2012-01-01 00:01:00 2012 2012-01
2012-01-01 00:01:00 2012 2012-01
2012-01-01 00:01:00 2012 2012-01
2012-01-01 00:01:00 2012 2012-01
{:.table-sql}

What might be hard to figure out is how to set the condition to limit records in which the "datetime" value (which is not just a date but a time) is "equal" to the first day of January 2012.

This won't work:

SELECT 
  datetime
FROM incidents
WHERE datetime = '2012-01-01'
LIMIT 1;

Though using comparisons might work:

SELECT 
  datetime
FROM incidents
WHERE datetime > '2012-01-01' 
  AND datetime < '2012-01-02'
LIMIT 1;

There's all kind of variety:

SELECT 
  datetime
FROM incidents
WHERE datetime 
  BETWEEN '2012-01-01' AND '2012-01-02'
LIMIT 1;

Query

SELECT 
  incident_number,
  datetime,
  category,
  description,
  resolution,
  pd_district
FROM 
  incidents
WHERE 
  SUBSTR(datetime, 1, 10) = '2012-01-01'
ORDER BY 
  datetime DESC
LIMIT 1;

4. Find the first 5 incidents of the year 2014

Expected results

incident_number datetime category description resolution pd_district
140592131 2014-01-01 00:01:00 SEX OFFENSES, FORCIBLE SODOMY ARREST, BOOKED MISSION
140882346 2014-01-01 00:01:00 FRAUD FRAUDULENT USE OF AUTOMATED TELLER CARD NORTHERN
140309936 2014-01-01 00:01:00 OTHER OFFENSES FALSE PERSONATION TO RECEIVE MONEY OR PROPERTY UNFOUNDED RICHMOND
140471381 2014-01-01 00:01:00 FRAUD CREDIT CARD, THEFT BY USE OF CENTRAL
140057743 2014-01-01 00:01:00 SUSPICIOUS OCC SUSPICIOUS OCCURRENCE NORTHERN

Query

SELECT 
  incident_number,
  datetime,
  category,
  description,
  resolution,
  pd_district
FROM 
  incidents
WHERE 
  SUBSTR(datetime, 1, 4) = '2014'
ORDER BY 
  datetime ASC
LIMIT 5;

5. Find the first 5 chronological vehicle thefts that had a resolution other than "NONE"

Expected results

incident_number datetime category description resolution pd_district
120005342 2012-01-03 06:42:00 VEHICLE THEFT ATTEMPTED STOLEN VEHICLE ARREST, BOOKED MISSION
120011355 2012-01-04 21:00:00 VEHICLE THEFT STOLEN AUTOMOBILE UNFOUNDED SOUTHERN
120016656 2012-01-05 16:00:00 VEHICLE THEFT STOLEN AUTOMOBILE ARREST, BOOKED RICHMOND
120017604 2012-01-06 10:30:00 VEHICLE THEFT STOLEN TRUCK ARREST, BOOKED INGLESIDE
120017165 2012-01-06 21:45:00 VEHICLE THEFT STOLEN AUTOMOBILE UNFOUNDED INGLESIDE

Query

SELECT 
  incident_number,
  datetime,
  category,
  description,
  resolution,
  pd_district
FROM 
  incidents
WHERE 
  category = 'VEHICLE THEFT'
  AND resolution != 'NONE'
ORDER BY 
  datetime ASC
LIMIT 5;

6. Find the last 5 incidents of the year 2014 that had something to do with cocaine

Related: Using LIKE, IN, BETWEEN, and wildcards to match multiple values in SQL

Expected Results

incident_number datetime category description resolution pd_district
141098374 2014-12-30 18:55:00 DRUG/NARCOTIC POSSESSION OF BASE/ROCK COCAINE FOR SALE ARREST, BOOKED TENDERLOIN
141096942 2014-12-30 11:30:00 DRUG/NARCOTIC POSSESSION OF BASE/ROCK COCAINE FOR SALE ARREST, BOOKED SOUTHERN
141094366 2014-12-29 15:10:00 DRUG/NARCOTIC POSSESSION OF BASE/ROCK COCAINE TENDERLOIN
141094366 2014-12-29 15:10:00 DRUG/NARCOTIC SALE OF BASE/ROCK COCAINE TENDERLOIN
141092928 2014-12-29 03:15:00 DRUG/NARCOTIC POSSESSION OF COCAINE INGLESIDE

Query

SELECT 
  incident_number,
  datetime,
  category,
  description,
  resolution,
  pd_district
FROM 
  incidents
WHERE 
  description LIKE '%cocaine%'
  AND SUBSTR(datetime, 1, 4) = '2014'
ORDER BY 
  datetime DESC
LIMIT 5;

7. Find the last 5 (chronological) "DRUG/NARCOTIC" crimes of the year 2014 that was not specifically about cocaine

Assume that "not specifically about cocaine" means that the word 'cocaine' did not show up in the description.

Expected Results

incident_number datetime category description resolution pd_district
141101610 2014-12-31 21:51:00 DRUG/NARCOTIC POSSESSION OF METH-AMPHETAMINE ARREST, BOOKED RICHMOND
141101444 2014-12-31 20:37:00 DRUG/NARCOTIC POSSESSION OF MARIJUANA ARREST, BOOKED MISSION
141099429 2014-12-31 08:08:00 DRUG/NARCOTIC POSSESSION OF CONTROLLED SUBSTANCE ARREST, BOOKED SOUTHERN
141099429 2014-12-31 08:08:00 DRUG/NARCOTIC POSSESSION OF NARCOTICS PARAPHERNALIA ARREST, BOOKED SOUTHERN
141099350 2014-12-31 03:16:00 DRUG/NARCOTIC POSSESSION OF MARIJUANA TARAVAL

Query

SELECT 
  incident_number,
  datetime,
  category,
  description,
  resolution,
  pd_district
FROM 
  incidents
WHERE 
  description NOT LIKE '%cocaine%'
  AND category = 'DRUG/NARCOTIC'
  AND SUBSTR(datetime, 1, 4) = '2014'
ORDER BY 
  datetime DESC
LIMIT 5;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment