Instantly share code, notes, and snippets.

# SQL practice with SFPD incidents and simple aggregation (2 of 4)

https://gist.github.com/dannguyen/7e4fcfc6b91dd7749a23560912ee4e1e#file-02-counting-md

## Directions

This exercise set has 5 problems, not including two example problems (i.e. 1x and 5x).

Email me at dun@stanford.edu with the subject:

``````padjo-2017 homework **sql-ex-2** **your_sunet_id**
``````

With a Google Doc (attached or linked-to) in which each query has its own page.

The file to use, as before, is this (zipped) SQLite file of SFPD incidents from 2012 through 2016:

## Introduction

The SFPD data doesn't lend itself to much quantification, as the latitude and longitude columns are the only numerical columns, thus sum, average, max/min aren't very interesting aggregate methods.

So for this exercise, we just care about the COUNT function. When COUNT, or any aggregate function, is used without a GROUP BY clause, the result is a single value.

Pretend you had a spreadsheet and you used the `=COUNTA()` or `=SUM()` function across every row, i.e. `=COUNTA(A1:A9999` and `=SUM(A1:A9999)`. You would expect the function to deliver a single value.

The spreadsheet equivalent of "GROUP BY" is to do things using pivot tables. But for these SQL exercises, it's a singular count (i.e. not a pivot table, i.e. not using GROUP BY).

Besides the COUNT() function in the SELECT clause, all the other SQL syntax remains the same.

Check out this past lesson, Aggregate functions in SQL. Specifically, focus on the The COUNT section and especially, the Counting without grouping section. (Ignore the GROUP BY stuff for now)

## Exercises

### 1x. (EXAMPLE) Just do a count

The syntax for calculating the count of rows in a result set is simply:

`SELECT COUNT(*); `

Believe it or not, even without reference to the database table (i.e. `incidents`), this has a valid result:

COUNT(*)
1

This can be interpreted as something like, The number of rows in the result for our useless queries is: 1.

### 1. Total number of incidents

Using `SELECT COUNT(*)` with a `FROM` clause will count the total number of rows in the table.

total_count
751047

#### Hints

The expected answer expects a column name of total_count, so your query should include the proper alias clause.

### 2. Total number of incidents categorized as "ASSAULT"

Same as previous question, except there is a WHERE clause and a different alias.

total_assaults
63874

#### Query

(I'm just giving you this answer)

```SELECT
COUNT(*) AS total_assaults
FROM
incidents
WHERE
category = 'ASSAULT';```

### 3. Total number of incidents having something to do with marijuana

Assume "having something to do with marijuana" means that the literal text 'marijuana' shows up in the "description" field, i.e. you will want to use the LIKE and wildcards.

Read: Fuzzy matching in SQL, particularly Matching patterns with LIKE

total_count
5589

total_count
150128

### 5x. (EXAMPLE) Total number of incidents that took place in the "TENDERLOIN" district that were either "UNFOUNDED" or "NOT PROSECUTED"

Nothing different from the previous exercises, just a slightly more complicated logic in the WHERE clause.

total_count
504

#### Hints

It is very easy to mix up the logic here. For example, this is NOT the logic that we want for finding both kinds of "resolution":

```  resolution = 'UNFOUNDED'
AND resolution = 'NOT PROSECUTED'```

That is mutually exclusive (do the logic in your head). We want to use OR:

```  resolution = 'UNFOUNDED'
OR resolution = 'NOT PROSECUTED'```

Now look at the `"pd_district" = 'TENDERLOIN'` logic separately:

`pd_district = 'TENDERLOIN'`

How do we combine the two conditional clauses?

##### Without parentheses

(the exerpt below is obviously just an excerpt, not valid syntax)

```
pd_district = 'TENDERLOIN'
AND resolution = 'UNFOUNDED'
OR resolution = 'NOT PROSECUTED'```

The above logic returns results in which:

1. The district is "TENDERLOIN" and the resolution is "UNFOUNDED"
2. The resolution is "NOT PROSECUTED"

The results of #2 include rows from districts that are not the TENDERLOIND

##### Using parentheses to group logic

Using parentheses, as we do in arithmetic, we can enforce the order of the logic:

```
pd_district = 'TENDERLOIN'
AND
(resolution = 'UNFOUNDED'
OR resolution = 'NOT PROSECUTED')```

Unlike the previous bad example, there are only 2 separate conditions that must both be met. The latter condition, dealing with the resolution, can have multiple choices. But in the end, all results are in the 'TENDERLOIN' district.

#### Query

```SELECT
COUNT(*) AS total_count
FROM
incidents
WHERE
pd_district = 'TENDERLOIN'
AND (resolution = 'UNFOUNDED'
OR resolution = 'NOT PROSECUTED');```

### 5. Total number of incidents taking place in 2016 in either the "TENDERLOIN" or "MISSION" districts

total_count
29488

#### Query

```SELECT
COUNT(*) AS total_count
FROM
incidents
WHERE
SUBSTR(datetime, 1, 4) = '2016'
AND
pd_district IN ('TENDERLOIN', 'MISSION');```