Set Operations
Erin,
Could you add the following information?
- Ex 2: Simple fix, but since the data is from an ecommerce store, could we change the first paragraph in exercise 2 to reflect an ecommerce example.
- Ex 3: Missing the example of what the user should do
- Finally, Would you be able to add the answer queries that you have in the instructions? We'll write test code for the user to submit this answer.
===
Let's continue exploring table transformation through the union
clause. Unions allow us to utilize information from multiple tables in our queries. In this lesson, we’ll utilize data from an ecommerce store. Let’s explore the available data we’ll be using.
Instructions:
-
In our database, we have products tables that contain metadata about each product in the store. To see a sample, select 10 rows from the
new_products
table. -
There are order items tables that detail the items that users have ordered from the store. To explore the order items data, select 10 rows from
order_items
.
===
Sometimes, in order to answer certain questions based on data, we need to merge two tables together and then query the merged result. Perhaps we have two tables that contain information about products in an ecommerce store that we would like to combune.
There are two ways of doing this:
- Merge the rows, called a join.
- Merge the columns, called a union.
We'll focus on unions here. Union combines the result of two or more SELECT statements, using the following syntax:
SELECT column_name(s) FROM table1
UNION
SELECT column_name(s) FROM table2;
Each SELECT statement within the UNION must have the same number of columns with similar data types. The columns in each SELECT statement must be in the same order. By default, the UNION operator selects only distinct values.
Suppose we are a growing ecommerce store and recently acquired another store to diversify our offering. The product data still exists in two separate tables: a legacy_products
table and a new_products
table. To get the complete list of distinct product names from both tables, we can perform the following union.
SELECT item_name FROM legacy_products
UNION
SELECT item_name FROM new_products
--
instructions:
- Find the distinct list of brands from the legacy products and new_products tables.
===
Great job! What if we wanted to allow duplicate values? We can do this by using the the ALL
keyword with UNION
, with the following syntax:
SELECT column_name(s) FROM table1
UNION ALL
SELECT column_name(s) FROM table2;
In our ecommerce store, if we learned that we had records from historic order items in an additional table, we could use the following query to combine the tables for a complete analysis of sale price:
SELECT id, sale_price FROM order_items
UNION ALL
SELECT id, sale_price FROM order_items_historic
Then we can perform an analysis on top of the combined result set, like finding the total count of order items.
SELECT count(*) FROM (
SELECT id, sale_price FROM order_items
UNION ALL
SELECT id, sale_price FROM order_items_historic) as a
--
instructions: Using the same pattern, utilize a subquery to find the average sale price over both order_items
and order_items
historic tables.
===
INTERSECT is used to combine two SELECT statements, but returns rows only from the first SELECT statement that are identical to a row in the second SELECT statement. This means that it returns only common rows returned by the two SELECT statements.
SELECT column_name(s) FROM table1
INTERSECT
SELECT column_name(s) FROM table2
For instance, we might want to know what brands in our newly acquired store are also in our legacy store. We can do so using the following query:
SELECT brand FROM new_products
INTERSECT
SELECT brand FROM legacy_products
instructions:
- Find what categories are in both in the newly acquired store and the legacy store.
===
EXCEPT is constructed in the same way, but returns distinct rows from the first SELECT statement that aren’t output by the second SELECT statement.
SELECT column_name(s) FROM table1
EXCEPT
SELECT column_name(s) FROM table2
Suppose we want to see if there are any categories that are in the new_products table that aren't in the legacy_products table. We can use an EXCEPT query to perform this analysis:
SELECT category FROM new_products
EXCEPT
SELECT category FROM legacy_products
--
instructions:
- Conversely, see if there are any categories that are in the legacy_products table that aren't in the new_products table.
===
- Set Operations
UNION
: allows us to utilize information from multiple tables in our queries.UNION ALL
: allows us to utilize information from multiple tables in our queries, including duplicate values.INTERSECT
is used to combine two SELECT statements, but returns rows only from the first SELECT statement that are identical to a row in the second SELECT statement.EXCEPT
returns distinct rows from the first SELECT statement that aren’t output by the second SELECT statement