Skip to content

Instantly share code, notes, and snippets.

@BandanaKM
Created March 16, 2016 17:51
Show Gist options
  • Save BandanaKM/f2517b55c2a9168a516f to your computer and use it in GitHub Desktop.
Save BandanaKM/f2517b55c2a9168a516f to your computer and use it in GitHub Desktop.
sql-looker markdown files

SQL

Erin,

Could you add the following information?

  • Ex 1: Could you explain what table transformation is, at the very beginning? New SQL users may not have a clear idea of it.
  • Ex 3: I added one more exercise focused on non-correlated subqueries to let the idea settle in. In the instructions, could you add one more query that users could run on the dataset?
  • Ex 5: We may need to clarify this for users, but this is good for now.
  • Ex 6: Could you add one more example of a correlated subquery so users can practice?
  • Finally, could you add the answer query to all the instructions below?

Data is often available in a raw format at a granular level. Oftentimes we need to summarize or aggregate this data to answer questions we’d like to find from the data. SQL is a language that can be used to accomplish this table transformation. In this course, we’ll examine multiple ways to transform data in tables via SQL to get a desired result set.

Ex 1: Subqueries

While working with databases, we often need multiple steps to complete a transformation. Subqueries, sometimes referred to as inner queries or nested queries, are often used to complete that first step of an SQL transformation.

In this lesson, we'll look aviation data and transform it with SQL using subqueries. At a high level, when using subqueries, we first run an inner query first. Then, we run an outer query on the inner query’s result set to perform a multiple step transformation.

If you are new to SQL, we recommend you do this course first.

Instructions:

  1. We'll be working with an airport database. Let's begin by looking at the data tables. First, select 10 rows of the flights table.

  2. Next, select 10 rows of the airports table.


Ex 2: Subqueries: Non-Correlated I

Nice work! We'll begin our exploration of subqueries.

Imagine that we are the head of air traffic control. In our airplanes database, we'd like to know which flights had an origin airport with an elevation greater than 2000 feet. We can do this with a subquery:

The inner query, or subquery, first finds the airports with elevation greater than 2000 from the airports table:

SELECT code 
  FROM airports 
  WHERE elevation > 2000

Next, we take the result set of the the inner query and use it to filter on the flights table, to find the flight detail that meets the elevation criteria.

SELECT * 
FROM flights 
WHERE origin in (
	SELECT code 
	FROM airports 
	WHERE elevation > 2000)

The subquery is thus nested within another query, in order to

Instruction:

  1. Using the same pattern, find flight information about flights where the destination elevation is less than 2000 feet.
SELECT * 
FROM flights 
WHERE origin in (
  SELECT code 
  FROM airports 
  WHERE elevation < 2000)

===

Ex 3: Subqueries: Non-Correlated II

Great! The query we just ran is just one kind of subquery - what we’ll refer to as a non-correlated subquery. A non-correlated subquery is a subquery that can be run independently of the outer query and as we saw, can be used to complete a multi-step transformation.

Let's try one more non-correlated subquery. Perhaps we'd like to look at a selection of flights whose origin ariport is a seaplane base. The facility type of an aiport is located in the fac_type field of the aiports table.

Instructions:

  1. Using the same pattern, find flight information about flights where the origin airport is a seaplane base (denoted by the facility type SEAPLANE_BASE)

===

Ex 4: Subqueries: Non-Correlated III

The non-correlated subquery examples we've used so far utilized two tables (flights and airports), but we can also perform transformations on a single table. For instance, sometimes we need to aggregate in multiple steps - like taking an average of a count.

Imagine you’d like to know how many flights there are on average, for Fridays in of a given month from the flights table. First, we’d need to calculate the number of flights per day, and then we’d need to calculate the average based on the daily flight count for each day of the week. We can do this all in one step using a subquery:

SELECT a.dep_month,
       a.dep_day_of_week,
       AVG(a.flight_count) AS average_flights
  FROM (
        SELECT dep_month,
   	       dep_day_of_week,
               dep_date,
               COUNT(*) AS flight_count
          FROM flights
         GROUP BY 1,2,3
       ) a
 GROUP BY 1,2
 ORDER BY 1,2

The inner query provides the count of flights by day, and the outer query uses the inner query’s result set to compute the average by day of week of a given month.

instructions:

  1. Using a subquery, find the average total distance flown by day of week and month.

===

Ex 5: Correlated Subqueries I

Nice work! We can also write correlated subqueries in SQL. In a correlated subquery the subquery can not be run independently of the outer query. The order of operations is important to understand in a correlated subquery:

  1. A row is processed in the outer query.
  2. Then, for that particular row in the outer query, the subquery is executed. This means that for each row processed by the outer query, the subquery will also be processed for that row. In this example, we will find the list of all flights whose distance is above average for their carrier.
SELECT id
FROM flights AS f
WHERE distance > (
 SELECT AVG(distance)
 FROM flights
 WHERE carrier = f.carrier);

In the above query the inner query has to be re-executed for each flight. Correlated subqueries may appear elsewhere besides the WHERE clause, they can also appear in the SELECT.

instructions:

  1. Find the flights whose distance is below average for their carrier.

===

Ex 6: Correlated Subqueries II

Something also that could be interesting would be to order our flights by giving them a sequence number based on time, by carrier. For instance, assuming flight_id increments with each additional flight, we could use the following query to start to view a sample of carriers, flight ids, and sequence numbers:

SELECT carrier, id,
    (SELECT COUNT(*)
FROM flights f
WHERE f.id < flights.id
AND f.carrier=flights.carrier) + 1
 AS flight_sequence_number
FROM flights

instructions:

  1. Using the same pattern, find the origin aiport, flight id, and sequence number based on origin airport.

==

Ex 7: Generalizations

educational text

  • Table Transformation:
  • Subquery: A subquery is query nested within another SQL query, often added within the WHERE clause.
  • Non-Correlated Subqueries: A subquery that can be run independently of the outer query, and can be used to complete a multi-step transformation.
  • Correlated Subqueries: A subquery that can not be run independently of the outer query.

=========================

Additional Notes:

  1. Subqueries
  2. Non-Correlated Subqueries
  3. Non-Correlated Subqueries
  4. Non-Correlated Subqueries
  5. Correlated Subqueries
  6. Correlated Subqueries
  7. Summary

By the end of the lesson, a user should be able to write a correlated, or non correlated subquery.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment