Skip to content

Instantly share code, notes, and snippets.

@BandanaKM
Created March 16, 2016 17:55
Show Gist options
  • Save BandanaKM/5434172bdca8096c7729 to your computer and use it in GitHub Desktop.
Save BandanaKM/5434172bdca8096c7729 to your computer and use it in GitHub Desktop.
2-set-operations.md

Set Operations

Erin,

Could you add the following information?

  • Ex 2: Simple fix, but since the data is from an ecommerce store, could we change the first paragraph in exercise 2 to reflect an ecommerce example.
  • Ex 3: Missing the example of what the user should do
  • Finally, Would you be able to add the answer queries that you have in the instructions? We'll write test code for the user to submit this answer.

===

Ex 1: Set Operations

Let's continue exploring table transformation through the union clause. Unions allow us to utilize information from multiple tables in our queries. In this lesson, we’ll utilize data from an ecommerce store. Let’s explore the available data we’ll be using.

Instructions:

  1. In our database, we have products tables that contain metadata about each product in the store. To see a sample, select 10 rows from the new_products table.

  2. There are order items tables that detail the items that users have ordered from the store. To explore the order items data, select 10 rows from order_items.

===

Ex 2: Union

Sometimes, in order to answer certain questions based on data, we need to merge two tables together and then query the merged result. Perhaps we have two tables that contain information about products in an ecommerce store that we would like to combune.

There are two ways of doing this:

  1. Merge the rows, called a join.
  2. Merge the columns, called a union.

We'll focus on unions here. Union combines the result of two or more SELECT statements, using the following syntax:

SELECT column_name(s) FROM table1
UNION
SELECT column_name(s) FROM table2;

Each SELECT statement within the UNION must have the same number of columns with similar data types. The columns in each SELECT statement must be in the same order. By default, the UNION operator selects only distinct values.

Suppose we are a growing ecommerce store and recently acquired another store to diversify our offering. The product data still exists in two separate tables: a legacy_products table and a new_products table. To get the complete list of distinct product names from both tables, we can perform the following union.

SELECT item_name FROM legacy_products
UNION 
SELECT item_name FROM new_products

--

instructions:

  1. Find the distinct list of brands from the legacy products and new_products tables.

===

Ex 3: Union All

Great job! What if we wanted to allow duplicate values? We can do this by using the the ALL keyword with UNION, with the following syntax:

SELECT column_name(s) FROM table1
UNION ALL
SELECT column_name(s) FROM table2;

In our ecommerce store, if we learned that we had records from historic order items in an additional table, we could use the following query to combine the tables for a complete analysis of sale price:

SELECT id, sale_price FROM order_items
UNION ALL
SELECT id, sale_price FROM order_items_historic

Then we can perform an analysis on top of the combined result set, like finding the total count of order items.

SELECT count(*) FROM (
  SELECT id, sale_price FROM order_items
  UNION ALL
  SELECT id, sale_price FROM order_items_historic) as a

--

instructions: Using the same pattern, utilize a subquery to find the average sale price over both order_items and order_items historic tables.

===

Ex 4: Intersect

INTERSECT is used to combine two SELECT statements, but returns rows only from the first SELECT statement that are identical to a row in the second SELECT statement. This means that it returns only common rows returned by the two SELECT statements.

SELECT column_name(s) FROM table1

INTERSECT

SELECT column_name(s) FROM table2

For instance, we might want to know what brands in our newly acquired store are also in our legacy store. We can do so using the following query:

SELECT brand FROM new_products

INTERSECT

SELECT brand FROM legacy_products

instructions:

  1. Find what categories are in both in the newly acquired store and the legacy store.

===

Ex 5: Except

EXCEPT is constructed in the same way, but returns distinct rows from the first SELECT statement that aren’t output by the second SELECT statement.

SELECT column_name(s) FROM table1

EXCEPT

SELECT column_name(s) FROM table2

Suppose we want to see if there are any categories that are in the new_products table that aren't in the legacy_products table. We can use an EXCEPT query to perform this analysis:

SELECT category FROM new_products

EXCEPT

SELECT category FROM legacy_products

--

instructions:

  1. Conversely, see if there are any categories that are in the legacy_products table that aren't in the new_products table.

===

Ex 6: Summary

  • Set Operations
  • UNION: allows us to utilize information from multiple tables in our queries.
  • UNION ALL: allows us to utilize information from multiple tables in our queries, including duplicate values.
  • INTERSECT is used to combine two SELECT statements, but returns rows only from the first SELECT statement that are identical to a row in the second SELECT statement.
  • EXCEPT returns distinct rows from the first SELECT statement that aren’t output by the second SELECT statement
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment