Skip to content

Instantly share code, notes, and snippets.

@namyarxam
Created May 10, 2016 17:15
Show Gist options
  • Save namyarxam/b9dc5e2c90ca6983e2626a4c7a6a2b06 to your computer and use it in GitHub Desktop.
Save namyarxam/b9dc5e2c90ca6983e2626a4c7a6a2b06 to your computer and use it in GitHub Desktop.
My PSQL Join Guide from General Assembly

Summary

On the most basic level, the way we learned to query through a table in one of our databases is pretty simple.

The SQL SELECT statement allows us to search through a table and return a custom results table with whatever columns/groupings/operations we specified in our query.

SELECT * FROM movies WHERE rottentomatoes > 85 AND lead_actor like 'Kevin Spacey';

Unfortunately, sometimes the queries need an added level of complexity and a simple statement like the one above wont do the trick. Here are two scenarios where we need to add some intricacy to our SQL statement in order to perform a sufficient query:

  1. We want to query for consistencies between two separate tables that are connected through a reference key.
  2. We want to query for AND consolidate ALL the references some 'parent' table has to all its children: whether its through a foreign key table reference or through an 'xref' table of multiple item references.

Hopefully by the end of this wiki page, you should have a better understanding on what the above two statements mean and how to implement them in a single SQL query statement.

JOINS

SQL JOIN's are used to combine rows from 2+ tables based on a common field between them.

Below is an image that does an excellent job representing all of the different SQL joins and how to implement them (credit to www.codeproject.com).

SQL

"Cool graphic, but what does all of this mean? I still don't get it!"

I didn't get this at first either, but looking at a real example helped me understand exactly what the different statements mean.

Here are two separate tables in our poorly structured database:

Table A: ga_students_names

ida last first
1 Seminara Jagermeister
3 Hart Colossus
4 Petrov Petunia

Table B: ages_residencies_of_ga_students

idb age residency
1 59 Battery Park
2 12 Lenox Hill
4 59 GA 4th Floor

INNER JOIN

  • Gives the intersection of the two tables, or the rows they have common where Table A reference = Table B reference.
We can take this JOIN apart a bit more in-depth than the rest to ensure it makes perfect sense:
SELECT gsn.first, gsn.last, arga.age, arga.residency
FROM ga_students_names as gsn
INNER JOIN ages_residencies_of_ga_students as arga
ON gsn.ida = arga.idb;

This query will return all of the matching items in the SELECT statement where ida and idb are present in BOTH tables.

first last age residency
Jagermeister Seminara 59 Battery Park
Petunia Petrov 59 GA 4th Floor

The best way to vizualize exactly how JOINs work is to see what would happen if we alter the SELECT statement to a * query:

SELECT *
FROM ga_students_names as gsn
INNER JOIN ages_residencies_of_ga_students as arga
ON gsn.ida = arga.idb;

Which will return the table

ida last first idb age residency
1 Seminara Jagermeister 1 59 Battery Park
4 Petrov Petunia 4 59 GA 4th Floor

As you can see, the statement INHERINTLY joins the rows with shared ID's, so all you have to do is:

  1. specify the columns you want
  2. refer to each desired column by calling its table (object-like syntax)
  3. specify the order that you want the items in!

LEFT JOIN

  • LEFT: Gives ALL selected rows from the LEFT table in the ON statement and any common selected rows from the RIGHT table.

LEFT and RIGHT joins solely differ from each other by the ordering of the syntax in the ON section of the query statement.

For these statements, unlike INNER JOIN, the ORDER MATTERS! Lets take the same query above and change INNER to LEFT and see the differences:

SELECT gsn.first, gsn.last, arga.age, arga.residency
FROM ga_students_names as gsn
LEFT JOIN ages_residencies_of_ga_students as arga
ON gsn.ida = arga.idb;

(left: gsn.id) = (right: gsn.idb)

first last age residency
Jagermeister Seminara 59 Battery Park
Petunia Petrov 59 GA 4th Floor
Colossus Hart
As you can see, LEFT JOIN operates in two steps:
  1. Ignore the join statement, print all lines in the LEFT table.
  2. Enter RIGHT table:
  • if the ON statement is matched at any row, append those rows with data
  • if not, append NULL to the rest of the columns

RIGHT JOIN

  • RIGHT: Gives ALL selected rows from the RIGHT table in the ON statement and any common selected rows from the LEFT table.
SELECT gsn.first, gsn.last, arga.age, arga.residency
FROM ga_students_names as gsn
RIGHT JOIN ages_residencies_of_ga_students as arga
ON gsn.ida = arga.idb;

(left: gsn.id) = (right: gsn.idb)

first last age residency
Jagermeister Seminara 59 Battery Park
Petunia Petrov 59 GA 4th Floor
12 Lenox Hill
As you can see, RIGHT JOIN operates the same way as LEFT JOIN, but reverses the order of the tables in the ON statement.

FULL JOIN

  • A lot less useful than the above, simply returns a table that is a full representation of every line of both table and its null elements.

For this example, the outcome would be:

first last age residency
Jagermeister Seminara 59 Battery Park
Petunia Petrov 59 GA 4th Floor
12 Lenox Hill
Colossus Hart

CHAINING

We can add a lot of functionality to our join statements by doing several things:

  • modifying the ON statement with AND clauses in between
    • ex: ON tableA.id = tableB.id and tableA.age > 50
  • any other select query manipulations
    • SUM, PROD, GROUP BY, ORDER BY ...
Once tables are JOINED, it makes it easy to query for things that have common references in two or more tables!

In one of our class examples we had three tables in our 'carmen' database: country, city, and countrylanguage.

Each had a reference key (code, countrycode, countrycode) that allowed us to JOIN the different tables!

We had to find the percentage of people in the country where the city 'Oldenberg' is located that speak the country's official language. Now it doesnt seem that hard, does it?

Example #1

First, we need to find out what we want to display in our table (and what we want to display it AS).

In the select statement we can just do a little math and call from our tables, knowing that we will JOIN them later:

> SELECT ROUND(country.population*(countrylanguage.percentage/100)) as native_speakers, countrylanguage.language as language

Then we need to specify a table we are going to use in the JOIN.

Pick a table you want to initiate the SELECT from. This table will be part of the ON statement:

> FROM country

Pick one of the two tables we want to JOIN on, and pick your JOIN ROUTE (LEFT works great here!):

> LEFT JOIN city

Add the ON statement, initiating the JOIN:

> ON country.code = city.countrycode

CHAINING TIME! It's easy, just add another JOIN / ON statement. :

> LEFT JOIN countrylanguage

> ON city.country

Last step - specify the question parameters in a WHERE statement from ANY of the linked tables!

WHERE city.name LIKE 'Oldenberg' and countrylanguage.isofficial


Many-to-many 'xref' table JOINs (burg-o-rama)

Sometimes chaining join statements wont be enough to get the outcome that you want.

We might have one ID that corresponds to several ID's from another table in a cross-reference (xref) table.

How can we output a table that shows one ID, and ALL of its other ID references in another column?

Example # 2

You are tasked to create a database for burger orders.

  • Each burger has 1 meat type.
  • Each burger has 1 bun type.
  • Each burger can have ANY amount of cheeses.
  • Each burger can have ANY amount of toppings.

Our ERD should look something like this:

ERD

How can we write 1 query statement to pull from orders, bread, meat, and also shows ALL of the corresponding cheeses and toppings in the SAME row?

We have to use an aggregate function called array_agg() in our statement.

The array_agg function is surprisingly easy to implement, just use it in your select statement in the table that has multiple many-to-one values (cheese, toppings).

Lets do this! The key in this scenario is to first pick a table you need to aggregate - well pick cheese first (remember any value not in the statement must be renamed as omething else in the statement).

What do we want to output?

SELECT
  o.orderid,
  meat.type as meat_type,
  bun.type as bun_type,
  array_agg(c.type) as cheeses
  from orders o

Lets do an INNER JOIN on the meat, because there will always be a matching meat:

INNER JOIN meat
ON o.meatid = meat.meatid

LEFT JOIN orders-buns, there might be null values for buns:

LEFT JOIN bun
ON o.bunid = bun.bunid

We only care about the joined cheeses/orders, so lets INNER JOIN the xref table:

INNER JOIN order_cheese oc
ON o.orderid = oc.orderid

Lastly we need to connect the cheese table and our cheese xref, we want to include null cheese values:

LEFT JOIN cheese c
ON c.cheeseid = oc.cheeseid
GROUP BY (o.orderid, meat.type, bun.type)
ORDER BY o.orderid

This full query will output an aggregated table for the cheeses:

orderid meat_type bun_type cheeses
1 Bison Brioche {Muenster}
2 Angus Focacchia {Swiss,Blue}

How do we add ANOTHER row of aggregated results?

This is where it gets complicated, but if you followed along this far it's not that bad.

THE FINAL QUERY

sql

We need to make a select statement that includes toppings, but our entire FROM table must be the result of the full query above.

  1. We need to rename the entire first query and use that query as our FROM table (essentially nesting our first table inside the search for our second one)
  2. Once we call from that table, all we need to do is sort by the table itself, and voila!
select
orderid,
meat_type,
bun_type,
cheeses,
array_agg(t.type) as toppings
from (

  select
  o.orderid,
  meat.type as meat_type,
  bun.type as bun_type,
  array_agg(c.type) as cheeses
  from orders o

  -- get the meats using meatid on both sides
  inner join meat on o.meatid = meat.meatid

  -- get the buns using bunid on both sides
  left join bun on o.bunid = bun.bunid

  -- use an inner join here because we only care about OUR orders/cheeses
  inner join order_cheese oc on o.orderid = oc.orderid

  -- left join here because we want to capture orders w/o cheese
  left join cheese c on c.cheeseid = oc.cheeseid

  group by (o.orderid, meat.type, bun.type)
  order by o.orderid

) as justCheese

-- use an inner join here because we only care about OUR orders/cheeses
inner join order_topping ot using(orderid)

-- left join here because we want to capture orders w/o topping
left join toppings t using (toppingid)

group by orderid, meat_type, bun_type, cheeses
order by justCheese.orderid

This will result in this beatiful table, which can be called and reference in your query's results.rows!

final

You just learned how to do a 2-layer JOIN with a nested quadruple and double join!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment