Skip to content

Instantly share code, notes, and snippets.

@BandanaKM
Created March 16, 2016 17:57
Show Gist options
  • Save BandanaKM/c98cb487e2efbdc7f17f to your computer and use it in GitHub Desktop.
Save BandanaKM/c98cb487e2efbdc7f17f to your computer and use it in GitHub Desktop.
4.advanced-aggregates.md

SQL

Erin: Ex 1. the instruction 1 is great. it would be good to offer a second instruction that's a tad bit challenging that the first, something like:

SELECT Role, COUNT(*) FROM Employees GROUP BT Region

Ex 2: Good

Ex 3: The followin instruction is fine:

Let's modify the Case Statement above to return NULL when the elevation does not meet the given conditions. I'm not sure users will know how to do it based on the example.

Is it possible to clarify instructions so users know exactly what to do?

Ex 4: Can we rewrite this, so that users see an example of COUNT(CASE WHEN). They can then use the example to write their own query.

Ex 5: Again, show example, have users implement query using the model given.

Ex 6: Again, show example, have users implement query using the model given.

Erin, I also wanted to ask you if there is a way of classifying or naming the type of aggregate functions you teach in this lesson. "Advanded Aggregates" is a temporary title that seems ambiguous. If you have a better name, that would work too!

Lesson 3: Advanced Aggregate Functions

Ex 1: Advanced Aggregate Functions

Aggregate functions compute a single result from a set of multiple input values. You can think of aggregate data as data collected from multiple rows at a time.

In this lesson, we'll continue learning about aggregate functions by focusing on conditionals, sums, and combining aggregate functions.

If you are new to aggregate functions, we recommend you do this course first.


instructions

  1. The count function is an aggregate function, since it aggregates data from multiple rows.

Count the number of rows in the employees table.

SELECT COUNT(*) FROM employees

  1. [Erin, here, insert an instruction that is a bit more difficult than the first]

​​

Ex 2: NULL

​ While working with databases, it's common to have empty or unknown "cells" in data tables. For example, in our first employees table, we might also be keeping track of when employees leave the company in a column called left_at: ​

employee_id first_name last_name started_at left_at
1 Lloyd Tabb 2012-01-09 13:24:18 +0000
2 Ben Porterfield 2012-01-09 13:24:18 +0000
3 Marc Randolph 2012-01-09 13:24:18 +0000
In this table, none of our employees have left the company yet, so we'll leave those cells empty as NULL.
What do we do when we need to test whether a value is or is not null? We use the special keywords IS NULL or IS NOT NULL in there WHERE clause (= NULL does not work).
--
instructions:
  1. Count the number of rows from the rows from the employees table, where left_at is not null.

answer: Run SELECT COUNT(*) FROM employees WHERE left_at IS NOT NULL

​ ​

Ex 3: CASE WHEN

​ Almost every programming language has a way to represent "if, then, else" logic. In SQL, we represent this logic with the CASE statement, as follows: ​

SELECT
	CASE
		WHEN elevation < 500 THEN 'Low'
		WHEN elevation BETWEEN 500 AND 1999 THEN 'Medium'
		WHEN elevation >= 2000 THEN 'High'
		ELSE 'Unknown'
	END AS elevation_tier
	, COUNT(1)
FROM airports
GROUP BY 1

​ In the above statement, END is required to terminate the statement, but ELSE is optional. If ELSE is not included, the result will be NULL. Also notice the shorthand method of referencing columns to use in GROUP BY, so we don't have to rewrite the entire Case Statement. ​ Let's modify the Case Statement above to return NULL when the elevation does not meet the given conditions. ​

​ instructions: ​

  1. Run
SELECT
	CASE
		WHEN elevation < 500 THEN 'Low'
		WHEN elevation BETWEEN 500 AND 1999 THEN 'Medium'
		WHEN elevation >= 2000 THEN 'High'
	END AS elevation_tier
	, COUNT(1)
FROM airports
GROUP BY 1

​ ​

Ex 4: COUNT(CASE WHEN )

​ Sometimes you want to look at an entire result set, but want to implement conditions on certain aggregates.

For instance, maybe you want to identify the total amount of airports as well as the total amount of airports with high elevation in the same result set. We can accomplish this by putting a CASE WHEN statement in the aggregate. ​ instructions: ​ ​

  1. [Explain to users what they need to do]

Run the query SELECT state, COUNT(*) AS count_airports, COUNT(CASE WHEN elevation >= 2000 THEN 1 ELSE NULL END) as count_high_elevation_aiports FROM airports GROUP BY state

​ ​ ​

Ex 5: SUM(CASE WHEN )

​ We can do that same thing for other aggregates like SUM(). For instance, if we wanted to sum the total flight distance and compare that to the sum of flight distance from a particular airline by origin airport, we could run the following query:

​ instructions: ​

  1. Run the query SELECT origin_code, sum(distance) as total_flight_distance, sum(CASE WHEN carrier_name = 'United' THEN distance ELSE 0 END) as total_united_flight_distance FROM flights GROUP BY origin_code ​ ​ === ​ ​ ​

Ex 6: Combining aggregates

​ Oftentimes we'd like to combine aggregates, to create percentages or ratios.

In the instance of the last query, we might want to find out the percent of flight distance that is from united by origin airport. We can do this simply by using the mathematical operators we need in SQL: ​

instructions: ​

  1. Run the query SELECT origin_code, 100*(sum(CASE WHEN carrier_name = 'United' THEN distance ELSE 0 END)/sum(distance)) as percentage_flight_distance_from_united FROM flights GROUP BY origin_code ​ ​ === ​ ​

Ex 6: Combining aggregates II

===

Ex 7: Summary:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment