Erin: Ex 1. the instruction 1 is great. it would be good to offer a second instruction that's a tad bit challenging that the first, something like:
SELECT Role, COUNT(*) FROM Employees GROUP BT Region
Ex 2: Good
Ex 3: The followin instruction is fine:
Let's modify the Case Statement above to return NULL
when the elevation does not meet the given conditions. I'm not sure users will know how to do it based on the example.
Is it possible to clarify instructions so users know exactly what to do?
Ex 4: Can we rewrite this, so that users see an example of COUNT(CASE WHEN). They can then use the example to write their own query.
Ex 5: Again, show example, have users implement query using the model given.
Ex 6: Again, show example, have users implement query using the model given.
Erin, I also wanted to ask you if there is a way of classifying or naming the type of aggregate functions you teach in this lesson. "Advanded Aggregates" is a temporary title that seems ambiguous. If you have a better name, that would work too!
Aggregate functions compute a single result from a set of multiple input values. You can think of aggregate data as data collected from multiple rows at a time.
In this lesson, we'll continue learning about aggregate functions by focusing on conditionals, sums, and combining aggregate functions.
If you are new to aggregate functions, we recommend you do this course first.
instructions
- The
count
function is an aggregate function, since it aggregates data from multiple rows.
Count the number of rows in the employees table.
SELECT COUNT(*) FROM employees
- [Erin, here, insert an instruction that is a bit more difficult than the first]
While working with databases, it's common to have empty or unknown "cells" in data tables. For example, in our first employees
table, we might also be keeping track of when employees leave the company in a column called left_at
:
employee_id | first_name | last_name | started_at | left_at |
---|---|---|---|---|
1 | Lloyd | Tabb | 2012-01-09 13:24:18 +0000 | ∅ |
2 | Ben | Porterfield | 2012-01-09 13:24:18 +0000 | ∅ |
3 | Marc | Randolph | 2012-01-09 13:24:18 +0000 | ∅ |
| ||||
In this table, none of our employees have left the company yet, so we'll leave those cells empty as NULL . |
||||
| ||||
What do we do when we need to test whether a value is or is not null? We use the special keywords IS NULL or IS NOT NULL in there WHERE clause (= NULL does not work). |
||||
| ||||
-- | ||||
| ||||
instructions: | ||||
|
- Count the number of rows from the rows from the employees table, where
left_at
is not null.
answer:
Run SELECT COUNT(*) FROM employees WHERE left_at IS NOT NULL
Almost every programming language has a way to represent "if, then, else" logic. In SQL, we represent this logic with the CASE
statement, as follows:
SELECT
CASE
WHEN elevation < 500 THEN 'Low'
WHEN elevation BETWEEN 500 AND 1999 THEN 'Medium'
WHEN elevation >= 2000 THEN 'High'
ELSE 'Unknown'
END AS elevation_tier
, COUNT(1)
FROM airports
GROUP BY 1
In the above statement, END
is required to terminate the statement, but ELSE
is optional. If ELSE
is not included, the result will be NULL
. Also notice the shorthand method of referencing columns to use in GROUP BY
, so we don't have to rewrite the entire Case Statement.
Let's modify the Case Statement above to return NULL
when the elevation does not meet the given conditions.
instructions:
- Run
SELECT
CASE
WHEN elevation < 500 THEN 'Low'
WHEN elevation BETWEEN 500 AND 1999 THEN 'Medium'
WHEN elevation >= 2000 THEN 'High'
END AS elevation_tier
, COUNT(1)
FROM airports
GROUP BY 1
Sometimes you want to look at an entire result set, but want to implement conditions on certain aggregates.
For instance, maybe you want to identify the total amount of airports as well as the total amount of airports with high elevation in the same result set. We can accomplish this by putting a CASE WHEN
statement in the aggregate.
instructions:
- [Explain to users what they need to do]
Run the query SELECT state, COUNT(*) AS count_airports, COUNT(CASE WHEN elevation >= 2000 THEN 1 ELSE NULL END) as count_high_elevation_aiports FROM airports GROUP BY state
We can do that same thing for other aggregates like SUM(). For instance, if we wanted to sum the total flight distance and compare that to the sum of flight distance from a particular airline by origin airport, we could run the following query:
instructions:
- Run the query
SELECT origin_code, sum(distance) as total_flight_distance, sum(CASE WHEN carrier_name = 'United' THEN distance ELSE 0 END) as total_united_flight_distance FROM flights GROUP BY origin_code
===
Oftentimes we'd like to combine aggregates, to create percentages or ratios.
In the instance of the last query, we might want to find out the percent of flight distance that is from united by origin airport. We can do this simply by using the mathematical operators we need in SQL:
instructions:
- Run the query
SELECT origin_code, 100*(sum(CASE WHEN carrier_name = 'United' THEN distance ELSE 0 END)/sum(distance)) as percentage_flight_distance_from_united FROM flights GROUP BY origin_code
===
===