Skip to content

Instantly share code, notes, and snippets.

@rrabinovitch
Last active August 11, 2020 11:14
Show Gist options
  • Save rrabinovitch/bb9c55b6a115d58ac0e3e952009cc475 to your computer and use it in GitHub Desktop.
Save rrabinovitch/bb9c55b6a115d58ac0e3e952009cc475 to your computer and use it in GitHub Desktop.

Intermediate SQL Challenge I

Goals

By the end of this lesson, you will know/be able to:

  • Understand INNER JOINS and OUTER JOINS
  • Understand Aggregate Functions
  • Introduce Subqueries

Structure

Lecture (tutorial)

Setup

From your terminal, run psql.

If you get an error that says something like Database username "YOUR_NAME" does not exist. you will need to create a database that shares the username. Run createdb "YOUR_NAME" and re-run psql.

Create a database to use as a playground: CREATE DATABASE intermediate_sql;

Close the current connection and connect to the DB we just created. \c intermediate_sql;

Create an items table: CREATE TABLE items(id SERIAL, name TEXT, revenue INT, course TEXT);

From above: What does SERIAL do?

Run SELECT * FROM items; to make sure it was successful.

Let's insert some data:

INSERT INTO items (name, revenue, course)
VALUES ('lobster mac n cheese', 1200, 'side'),
       ('veggie lasagna', 1000, 'main'),
       ('striped bass', 500, 'main'),
       ('arugula salad', 1100, 'salad');

Aggregate Functions

  • SELECT sum(column_name) FROM table_name;

  • SELECT avg(column_name) FROM table_name;

  • SELECT max(column_name) FROM table_name;

  • SELECT min(column_name) FROM table_name;
  • SELECT count(column_name) FROM table_name;

Write queries for the following:
  1. What's the total revenue for all items? SELECT sum(revenue) FROM items;
  2. What's the average revenue for all items? SELECT avg(revenue) FROM items;
  3. What's the minimum revenue for all items? SELECT min(revenue) FROM items;
  4. What's the maximum revenue for all items? SELECT max(revenue) FROM items;
  5. What the count for items with a name? SELECT count(name) FROM items;

Let's create an item that has all NULL values: INSERT into items (name, revenue, course) VALUES (NULL, NULL, NULL);

Typically you count records in a table by counting on the id column, like SELECT COUNT(id) FROM items;. However, it's not necessary for a table to have an id column. What else can you pass to count and still get 5 as your result? SELECT count(*) FROM items;

Building on Aggregate Functions

Now, combine multiple functions by returning both the minimum and maximum value from the revenue column: SELECT max(revenue), min(revenue) from items;

How can we get the revenue based on the course?

SELECT course, sum(revenue) FROM items GROUP BY course;

Write queries for the following:
  1. Return all main courses. Hint: What ActiveRecord method would you use to get this? SELECT * FROM items WHERE course='main';
  2. Return only the names of the main courses. SELECT name FROM items WHERE course='main';
  3. Return the min and max value for the main courses. SELECT min(revenue), max(revenue) FROM items WHERE course='main';
  4. What's the total revenue for all main courses? SELECT sum(revenue) FROM items WHERE course='main';

INNER JOINS

Now to the fun stuff. If you're a visual learner, you'll probably want to keep this article as you explore the concepts below. We're going to need multiple tables and to ensure we are on the same page, let's drop our table and populate our database with new data to experiment with.

DROP TABLE items;

Create some tables...

CREATE TABLE seasons(id SERIAL, name TEXT);
CREATE TABLE items(id SERIAL, name TEXT, revenue INT, season_id INT);
CREATE TABLE categories(id SERIAL, name TEXT);
CREATE TABLE item_categories(item_id INT, category_id INT);

Insert some data...

INSERT INTO seasons (name)
VALUES ('summer'),
       ('autumn'),
       ('winter'),
       ('spring');
INSERT INTO items (name, revenue, season_id)
VALUES ('lobster mac n cheese', 1200, 3),
       ('veggie lasagna', 1000, 1),
       ('striped bass', 500, 1),
       ('burger', 2000, 1),
       ('grilled cheese', 800, 4),
       ('hot dog', 1000, 1),
       ('arugula salad', 1100, 2);
INSERT INTO categories (name)
VALUES ('side'),
       ('dinner'),
       ('lunch'),
       ('vegetarian');
INSERT INTO item_categories (item_id, category_id)
VALUES (1, 1),
       (2, 2),
       (2, 4),
       (3, 2),
       (4, 3),
       (5, 3),
       (5, 4),
       (7, 1),
       (7, 2),
       (7, 3),
       (7, 4);

For our first query, we are going to grab each item and its season using an INNER JOIN.

SELECT * FROM items
INNER JOIN seasons
ON items.season_id = seasons.id;
id |         name         | revenue | season_id | id |  name
---+----------------------+---------+-----------+----+--------
 1 | lobster mac n cheese |    1200 |         3 |  3 | winter
 2 | veggie lasagna       |    1000 |         1 |  1 | summer
 3 | striped bass         |     500 |         1 |  1 | summer
 4 | burger               |    2000 |         1 |  1 | summer
 5 | grilled cheese       |     800 |         4 |  4 | spring
 6 | hot dog              |    1000 |         1 |  1 | summer
 7 | arugula salad        |    1100 |         2 |  2 | autumn
(7 rows)

This is useful, but we probably don't need all of the information from both tables.

  • Can you get it to display only the name for the item and the name for the season? SELECT items.name, seasons.name FROM items INNER JOIN seasons ON items.season_id = seasons.id;
  • Having two columns with the same name is confusing. Can you customize each heading using AS? SELECT items.name AS item, seasons.name AS season FROM items INNER JOIN seasons ON items.season_id = seasons.id;

It should look like this:

item_name            | season_name
---------------------+-------------
burger               | summer
veggie lasagna       | summer
striped bass         | summer
hot dog              | summer
arugula salad        | autumn
lobster mac n cheese | winter
grilled cheese       | spring
(7 rows)

Now let's combine multiple INNER JOINs to pull data from three tables items, categories and item_categories.

  • Write a query that pulls all the category names for arugula salad. Hint: Use multiple INNER JOINs and a WHERE clause. SELECT items.name, categories.name FROM items INNER JOIN item_categories ON items.id = item_categories.item_id INNER JOIN categories ON item_categories.category_id = categories.id WHERE items.name='arugala salad';

Can you get your return value to look like this?

name          |    name
--------------+------------
arugula salad | side
arugula salad | dinner
arugula salad | lunch
arugula salad | vegetarian
(4 rows)

Can you change the column headings? SELECT items.name AS item, categories.name AS category FROM items INNER JOIN item_categories ON items.id = item_categories.item_id INNER JOIN categories ON item_categories.category_id = categories.id WHERE items.name='arugala salad';

item_name     | category_name
--------------+---------------
arugula salad | side
arugula salad | dinner
arugula salad | lunch
arugula salad | vegetarian
(4 rows)

OUTER JOINS

To illustrate a LEFT OUTER JOIN we'll add a few records without a season_id.

INSERT INTO items (name, revenue, season_id)
VALUES ('italian beef', 600, NULL),
       ('cole slaw', 150, NULL),
       ('ice cream sandwich', 700, NULL);

Notice the result when we run an INNER JOIN on items and seasons.

SELECT i.name items, s.name seasons
FROM items i
INNER JOIN seasons s
ON i.season_id = s.id;

Bonus: This query uses aliases for items (i) and seasons (s) to make it cleaner. Notice that it's not necessary to use AS to name the column headings.

items                | seasons
---------------------+---------
hot dog              | summer
veggie lasagna       | summer
striped bass         | summer
burger               | summer
arugula salad        | autumn
lobster mac n cheese | winter
grilled cheese       | spring
(7 rows)

We don't see any of the new items that have NULL values for season_id.

A LEFT OUTER JOIN will return all records from the left table (items) and return matching records from the right table (seasons). Update the previous query and the return value and you should see something like this:

SELECT *
FROM items i
LEFT OUTER JOIN seasons s
ON i.season_id = s.id;
id  |         name        | revenue | season_id | id |  name
----+---------------------+---------+-----------+----+--------
 6 | hot dog              |    1000 |         1 |  1 | summer
 2 | veggie lasagna       |    1000 |         1 |  1 | summer
 3 | striped bass         |     500 |         1 |  1 | summer
 4 | burger               |    2000 |         1 |  1 | summer
 7 | arugula salad        |    1100 |         2 |  2 | autumn
 1 | lobster mac n cheese |    1200 |         3 |  3 | winter
 5 | grilled cheese       |     800 |         4 |  4 | spring
 8 | italian beef         |     600 |           |    |
 9 | cole slaw            |     150 |           |    |
10 | ice cream sandwich   |     700 |           |    |
(10 rows)

What do you think a RIGHT OUTER JOIN will do?

  • Write a query to test your guess.
  • Insert data into the right table that will not get returned on an INNER JOIN.

Subqueries

Sometimes you want to run a query based on the result of another query. Enter subqueries. Let's say I want to return all items with above average revenue. Two things need to happen:

  1. Calculate the average revenue.
  2. Write a WHERE clause that returns the items that have a revenue greater than that average.

Maybe something like this: SELECT * FROM items WHERE revenue > AVG(revenue);

Good try, but that didn't work.

Subqueries need to be wrapped in parentheses. We can build more complex queries by using the result of another query. Try using the following structure:

SELECT * FROM items
WHERE revenue > (Insert your query that calculates the avg inside these parentheses);

SELECT * FROM items WHERE revenue > (SELECT AVG(revenue) FROM items);

The result should look like so...

id |         name         | revenue | season_id
----+----------------------+---------+-----------
 1 | lobster mac n cheese |    1200 |         3
 2 | veggie lasagna       |    1000 |         1
 4 | burger               |    2000 |         1
 6 | hot dog              |    1000 |         1
 7 | arugula salad        |    1100 |         2
(5 rows)
  1. Without looking at the previous solution, write a WHERE clause that returns the items that have a revenue less than the average revenue. SELECT * FROM items where revenue < (SELECT AVG(revenue) FROM items);

Additional Challenges

  • Write a query that returns the sum of all items that have a category of dinner. SELECT SUM(items.revenue) FROM items INNER JOIN item_categories ON items.id = item_categories.item_id INNER JOIN categories ON categories.id = item_categories.category_id WHERE categories.name='dinner';
  • Write a query that returns the sum of all items for each category. The end result should look like this:
name       | sum
-----------+------
dinner     | 2600
vegetarian | 2900
lunch      | 3900
side       | 2300
(4 rows)

SELECT categories.name, SUM(revenue) FROM items INNER JOIN item_categories ON items.id = item_categories.item_id INNER JOIN categories ON categories.id = item_categories.category_id GROUP BY categories.name;

Possible Solutions

Some of these apply directly to challenges above. Most of them will need to be modified to acheive the challenge.

SELECT count(*) FROM items;
SELECT * FROM items WHERE course = 'main';
SELECT name FROM items WHERE course = 'main';
SELECT max(revenue), min(revenue) from items WHERE course = 'main';
SELECT sum(revenue) from items WHERE course = 'main';
SELECT * FROM items
WHERE revenue >
(SELECT AVG(revenue) FROM items);
SELECT SUM(i.revenue)
FROM items i
INNER JOIN item_categories ic
ON i.id = ic.item_id
INNER JOIN categories c
ON c.id = ic.category_id
WHERE c.name = 'dinner';
SELECT c.name, SUM(i.revenue)
FROM categories c
INNER JOIN item_categories ic
ON c.id = ic.category_id
INNER JOIN items i
ON i.id = ic.item_id
GROUP BY c.name;

Intermediate SQL Challenge II

  • Run psql
  • You may need to run CREATE DATABASE intermediate_sql;
  • To exit this shell, you can run CTRL d

Create tables

Let's create some tables in the database.

CREATE TABLE students(id SERIAL, name TEXT);
CREATE TABLE classes(id SERIAL, name TEXT, teacher_id INT);
CREATE TABLE teachers(id SERIAL, name TEXT, room_number INT);
CREATE TABLE enrollments(id SERIAL, student_id INT, class_id INT, grade INT);

Add data

Let's insert some students.

INSERT INTO students (name)
VALUES ('Penelope'),
       ('Peter'),
       ('Pepe'),
       ('Parth'),
       ('Priscilla'),
       ('Pablo'),
       ('Puja'),
       ('Patricia'),
       ('Piper'),
       ('Paula'),
       ('Pamela'),
       ('Paige'),
       ('Peggy'),
       ('Pedro'),
       ('Phoebe'),
       ('Pajak'),
       ('Parker'),
       ('Priyal'),
       ('Paxton'),
       ('Patrick');

Let's add some teachers.

INSERT INTO teachers (name, room_number)
VALUES ('Phillips', 456),
       ('Vandergrift', 120),
       ('Mauch', 101),
       ('Patel', 320),
       ('Marquez', 560),
       ('Boykin', 200),
       ('Phlop', 333),
       ('Pendergrass', 222),
       ('Palomo', 323),
       ('Altshuler', 543),
       ('Aleman', 187),
       ('Ashley', 432),
       ('Bonacci', 399),
       ('Brazukas', 287),
       ('Brockington', 299),
       ('Brizuela', 376),
       ('Burkhart', 199),
       ('Choi', 463),
       ('Shah', 354),
       ('Dimaggio', 251);

Let's add some classes.

INSERT INTO classes (name, teacher_id)
VALUES ('Cooking Pasta', 1),
       ('Yoga', 1),
       ('How to Guitar', 2),
       ('Gym', 3),
       ('Football', 4),
       ('Calculus', 5),
       ('Fruit', 6),
       ('Social Studies', 7),
       ('English', 8),
       ('Programming', 9),
       ('Singing', 10),
       ('Fashion', 11);

Lastly, let's add some enrollments!

INSERT INTO enrollments (student_id, class_id, grade)
VALUES (1, 1, 60),
       (2, 2, 70),
       (2, 4, 100),
       (3, 2, 74),
       (4, 3, 82),
       (5, 3, 45),
       (5, 4, 50),
       (7, 11, 62),
       (7, 10, 76),
       (7, 9, 81),
       (7, 8, 91),
       (8, 8, 84),
       (9, 8, 88),
       (9, 7, 83),
       (10, 7, 93),
       (10, 5, 95),
       (11, 5, 95),
       (11, 11, 80),
       (11, 6, 95),
       (11, 1, 94),
       (11, 2, 60),
       (12, 6, 55),
       (13, 7, 97),
       (14, 10, 86),
       (15, 9, 77),
       (15, 4, 93),
       (15, 1, 73),
       (16, 2, 79),
       (16, 6, 73),
       (17, 7, 86),
       (17, 8, 91),
       (17, 9, 93),
       (18, 10, 94),
       (19, 4, 84),
       (20, 1, 85),
       (20, 11, 89),
       (20, 3, 98);

Practice!!

  • List all the students and their classes
SELECT students.name, classes.name FROM students
INNER JOIN enrollments
       ON students.id = enrollments.student_id
INNER JOIN classes
       ON enrollments.class_id = classes.id;
  • List all the students and their classes and rename the columns to "student" and "class"
SELECT students.name AS student, classes.name AS class FROM students
INNER JOIN enrollments
       ON students.id = enrollments.student_id
INNER JOIN classes
       ON enrollments.class_id = classes.id;
  • List all the students and their average grade
SELECT students.name AS student, AVG(enrollments.grade) AS avg_grade FROM students
INNER JOIN enrollments
       ON students.id = enrollments.student_id
INNER JOIN classes
       ON enrollments.class_id = classes.id
GROUP BY students.name;
  • List all the students and a count of how many classes they are currently enrolled in
INNER JOIN enrollments
       ON students.id = enrollments.student_id
INNER JOIN classes
       ON enrollments.class_id = classes.id
GROUP BY students.name;
  • List all the students and their class count IF they are in more than 2 classes
SELECT students.name AS student, COUNT(classes) AS num_classes FROM students
INNER JOIN enrollments
       ON students.id = enrollments.student_id
INNER JOIN classes
       ON enrollments.class_id = classes.id
GROUP BY students.name
HAVING COUNT(classes) > 2;
  • List all the teachers for each student
SELECT students.name AS student, teachers.name AS teacher FROM teachers
INNER JOIN classes
       ON teachers.id=classes.teacher_id
INNER JOIN enrollments
       ON classes.id=enrollments.class_id
INNER JOIN students
       ON students.id=enrollments.student_id;
  • List all the teachers for each student grouped by each student
SELECT students.name AS student, teachers.name AS teacher FROM students
INNER JOIN enrollments
       ON students.id=enrollments.student_id
INNER JOIN classes
       ON classes.id=enrollments.class_id
INNER JOIN teachers
       ON teachers.id=classes.teacher_id
ORDER BY students.name;
  • Find the average grade for a each class
SELECT classes.name AS class, AVG(grade) AS avg_grade FROM enrollments
INNER JOIN classes
       ON enrollments.class_id=classes.id
GROUP BY classes.name;
  • List students' name and their grade IF their grade is lower than the average.
SELECT students.name AS student, enrollments.grade FROM students
INNER JOIN enrollments
       ON students.id=enrollments.student_id
WHERE grade < (SELECT AVG(grade) FROM enrollments);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment