Skip to content

Instantly share code, notes, and snippets.

@SuryaSankar
Last active March 18, 2021 16:37
Show Gist options
  • Save SuryaSankar/abccfaf1c385ccbede2d07601ad2bcc1 to your computer and use it in GitHub Desktop.
Save SuryaSankar/abccfaf1c385ccbede2d07601ad2bcc1 to your computer and use it in GitHub Desktop.
Example to understand joins

Let's try to understand joins by trying to verbalize what 'question' each join is trying to answer.

We will try to frame the explanation in terms of a real world example. We will build a Jobs DB with two tables - job and candidate. The job table will have a city field representing where the job location is at. Similarly candidate table also has a city field representing the city which the candidate prefers. Our goal is to match candidates to jobs based on the city. We will try to answer different questions with different joins and subqueries.

This seemed like a better example than other examples based on primary keys because this particular example does the matching on a column which can allow duplicate values. So we can understand the general case of how joins work. The case where the join is on a primary key column or a col with unique constraint - will just be a special case of this general case.

Creation of tables

CREATE TABLE candidate (id serial PRIMARY KEY, email VARCHAR (255) UNIQUE NOT NULL, city VARCHAR (255) );
CREATE TABLE job (id serial PRIMARY KEY, designation VARCHAR (255), company VARCHAR (255), city VARCHAR (255) );

Inserting data

We will keep Chennai and Bengaluru as common cities between the two tables and then add some extra entries to both tables with some other cities.

INSERT INTO candidate (email, city) VALUES 
('surya@gmail.com', 'Chennai'),  
('sankar@gmail.com', 'Bengaluru'), 
('alex@gmail.com', 'Chennai'), 
('bob@gmail.com', 'Pune'), 
('vicky@gmail.com', 'Hyderabad'), 
('sid@gmail.com', 'Bengaluru'), 
('vinod@gmail.com', 'Pune'), 
('kamal@gmail.com', 'Trivandrum'), 
('ajay@gmail.com', 'Trivandrum'), 
('arul@gmail.com', 'Chennai'), 
('akbar@gmail.com', 'Lucknow'), 
('manish@gmail.com', 'Indore');
INSERT INTO job (designation, company, city) VALUES 
('SDE 1', 'Amazon', 'Chennai'), 
('SDE 2', 'Amazon', 'Bengaluru'), 
('SDE3', 'Amazon', 'Seattle'), 
('SDE', 'Microsoft', 'Bengaluru'), 
('Engineering Manager', 'FunnelIQ', 'Chennai'), 
('SDE', 'Grab', 'Singapore'), 
('Product Manager', 'Uber', 'Bengaluru'), 
('Engineering Lead', 'Adobe', 'Gurgaon'), 
('Principal Engineer', 'Grab', 'Bengaluru'), 
('TPM', 'Amazon', 'Chennai'), 
('SDE2', 'Grab', 'Singapore'), 
('TPM', 'Amazon', 'Seattle'), 
('SDE1', 'Adobe', 'Gurgaon');

The data we are going to experiment with

Let's first review the data we are going to play around with.

SELECT * FROM candidate;
id email city
1 surya@gmail.com Chennai
2 sankar@gmail.com Bengaluru
3 alex@gmail.com Chennai
4 bob@gmail.com Pune
5 vicky@gmail.com Hyderabad
6 sid@gmail.com Bengaluru
7 vinod@gmail.com Pune
8 kamal@gmail.com Trivandrum
9 ajay@gmail.com Trivandrum
10 arul@gmail.com Chennai
11 akbar@gmail.com Lucknow
12 manish@gmail.com Indore

(12 rows)

SELECT * FROM job;
id designation company city
1 SDE 1 Amazon Chennai
2 SDE 2 Amazon Bengaluru
3 SDE3 Amazon Seattle
4 SDE Microsoft Bengaluru
5 Engineering Manager FunnelIQ Chennai
6 SDE Grab Singapore
7 Product Manager Uber Bengaluru
8 Engineering Lead Adobe Gurgaon
9 Principal Engineer Grab Bengaluru
10 TPM Amazon Chennai
11 SDE2 Grab Singapore
12 TPM Amazon Seattle
13 SDE1 Adobe Gurgaon

(13 rows)

SUBQUERIES

We will first try to answer a few simple queries

Candidates in cities which have some job available

This can be done with a subquery itself as we require fields from the candidate table only, with only the constraint being specified on the job table. So a subquery on the WHERE clause is enough.

SELECT candidate.id, candidate.email, candidate.city AS candidate_city 
FROM candidate WHERE candidate.city IN (SELECT city FROM job);
id email candidate_city
1 surya@gmail.com Chennai
2 sankar@gmail.com Bengaluru
3 alex@gmail.com Chennai
6 sid@gmail.com Bengaluru
10 arul@gmail.com Chennai

(5 rows)

Candidates in cities without any available jobs

SELECT candidate.id, candidate.email, candidate.city AS candidate_city 
FROM candidate WHERE candidate.city NOT IN (SELECT city FROM job);
id email candidate_city
4 bob@gmail.com Pune
5 vicky@gmail.com Hyderabad
7 vinod@gmail.com Pune
8 kamal@gmail.com Trivandrum
9 ajay@gmail.com Trivandrum
11 akbar@gmail.com Lucknow
12 manish@gmail.com Indore

(7 rows)

Jobs in cities with some available candidates

SELECT job.id, job.designation, job.company, job.city AS job_city 
FROM job WHERE job.city IN (SELECT city FROM candidate);
id designation company job_city
1 SDE 1 Amazon Chennai
2 SDE 2 Amazon Bengaluru
4 SDE Microsoft Bengaluru
5 Engineering Manager FunnelIQ Chennai
7 Product Manager Uber Bengaluru
9 Principal Engineer Grab Bengaluru
10 TPM Amazon Chennai

(7 rows)

Jobs in cities without any available candidates

SELECT job.id, job.designation, job.company, job.city AS job_city 
FROM job WHERE job.city NOT IN (SELECT city FROM candidate);
id designation company job_city
3 SDE3 Amazon Seattle
6 SDE Grab Singapore
8 Engineering Lead Adobe Gurgaon
11 SDE2 Grab Singapore
12 TPM Amazon Seattle
13 SDE1 Adobe Gurgaon

(6 rows)

Subqueries are enough if the fields we need are in one table only. But if we want to combine the fields from multiple tables, joins become necessary.

INNER JOIN

Let's say we are building an interface where we want to show all possible matching pairs of candidates and jobs. In this case the query to answer is "List all pairs of matching jobs and candidates which have the same city"

Inner Join is meant to answer this. The data has only Chennai and Bengaluru as the common cities. With 3 candidates in Chennai matched to 3 jobs in Chennai and 2 candidates in Bengaluru matched to 4 jobs there. So the total number of matching pairs would be 3 * 3 + 4 * 2 = 17

SELECT job.id AS job_id, job.designation, job.company, job.city AS job_city, candidate.id AS candidate_id, 
candidate.email AS candidate_email, candidate.city AS candidate_city 
FROM job INNER JOIN candidate ON job.city = candidate.city;
job_id designation company job_city candidate_id candidate_email candidate_city
10 TPM Amazon Chennai 1 surya@gmail.com Chennai
5 Engineering Manager FunnelIQ Chennai 1 surya@gmail.com Chennai
1 SDE 1 Amazon Chennai 1 surya@gmail.com Chennai
9 Principal Engineer Grab Bengaluru 2 sankar@gmail.com Bengaluru
7 Product Manager Uber Bengaluru 2 sankar@gmail.com Bengaluru
4 SDE Microsoft Bengaluru 2 sankar@gmail.com Bengaluru
2 SDE 2 Amazon Bengaluru 2 sankar@gmail.com Bengaluru
10 TPM Amazon Chennai 3 alex@gmail.com Chennai
5 Engineering Manager FunnelIQ Chennai 3 alex@gmail.com Chennai
1 SDE 1 Amazon Chennai 3 alex@gmail.com Chennai
9 Principal Engineer Grab Bengaluru 6 sid@gmail.com Bengaluru
7 Product Manager Uber Bengaluru 6 sid@gmail.com Bengaluru
4 SDE Microsoft Bengaluru 6 sid@gmail.com Bengaluru
2 SDE 2 Amazon Bengaluru 6 sid@gmail.com Bengaluru
10 TPM Amazon Chennai 10 arul@gmail.com Chennai
5 Engineering Manager FunnelIQ Chennai 10 arul@gmail.com Chennai
1 SDE 1 Amazon Chennai 10 arul@gmail.com Chennai

(17 rows)

Inner Join is symmetric. So the order doesn't matter. We can do INNER JOIN from candidate to job and see that it returns the same result set.

SELECT job.id AS job_id, job.designation, job.company, job.city AS job_city, 
candidate.id AS candidate_id, candidate.email AS candidate_email, candidate.city AS candidate_city 
FROM candidate INNER JOIN job ON job.city = candidate.city;
job_id designation company job_city candidate_id candidate_email candidate_city
10 TPM Amazon Chennai 1 surya@gmail.com Chennai
5 Engineering Manager FunnelIQ Chennai 1 surya@gmail.com Chennai
1 SDE 1 Amazon Chennai 1 surya@gmail.com Chennai
9 Principal Engineer Grab Bengaluru 2 sankar@gmail.com Bengaluru
7 Product Manager Uber Bengaluru 2 sankar@gmail.com Bengaluru
4 SDE Microsoft Bengaluru 2 sankar@gmail.com Bengaluru
2 SDE 2 Amazon Bengaluru 2 sankar@gmail.com Bengaluru
10 TPM Amazon Chennai 3 alex@gmail.com Chennai
5 Engineering Manager FunnelIQ Chennai 3 alex@gmail.com Chennai
1 SDE 1 Amazon Chennai 3 alex@gmail.com Chennai
9 Principal Engineer Grab Bengaluru 6 sid@gmail.com Bengaluru
7 Product Manager Uber Bengaluru 6 sid@gmail.com Bengaluru
4 SDE Microsoft Bengaluru 6 sid@gmail.com Bengaluru
2 SDE 2 Amazon Bengaluru 6 sid@gmail.com Bengaluru
10 TPM Amazon Chennai 10 arul@gmail.com Chennai
5 Engineering Manager FunnelIQ Chennai 10 arul@gmail.com Chennai
1 SDE 1 Amazon Chennai 10 arul@gmail.com Chennai

(17 rows)

LEFT OUTER JOIN

After showing the pairs of jobs and matching candidates, lets say we also want to show the jobs which don't have any matching candidate. This is answered by Left Outer Join.

The query statement is - "Pair jobs with matching candidates and list all pairs. If there is no matching candidate for a job, then pair it with an empty row on the candidate side and list that also"

This can also be phrased as - "Pairs of matching jobs and candidates which have the same city + Jobs in cities without any available candidates"

Now in the above expression, the first part is the same as the INNER JOIN and the second part is the same as the "NOT IN" subquery we wrote for "Jobs in cities without any available candidates"

So the total rows would be the sum of the rows for both those queries = 17 + 6 = 23

SELECT job.id AS job_id, job.designation, job.company, job.city AS job_city, 
candidate.id AS candidate_id, candidate.email AS candidate_email, 
candidate.city AS candidate_city FROM job 
LEFT OUTER JOIN candidate ON job.city = candidate.city;
job_id designation company job_city candidate_id candidate_email candidate_city
10 TPM Amazon Chennai 1 surya@gmail.com Chennai
5 Engineering Manager FunnelIQ Chennai 1 surya@gmail.com Chennai
1 SDE 1 Amazon Chennai 1 surya@gmail.com Chennai
9 Principal Engineer Grab Bengaluru 2 sankar@gmail.com Bengaluru
7 Product Manager Uber Bengaluru 2 sankar@gmail.com Bengaluru
4 SDE Microsoft Bengaluru 2 sankar@gmail.com Bengaluru
2 SDE 2 Amazon Bengaluru 2 sankar@gmail.com Bengaluru
10 TPM Amazon Chennai 3 alex@gmail.com Chennai
5 Engineering Manager FunnelIQ Chennai 3 alex@gmail.com Chennai
1 SDE 1 Amazon Chennai 3 alex@gmail.com Chennai
9 Principal Engineer Grab Bengaluru 6 sid@gmail.com Bengaluru
7 Product Manager Uber Bengaluru 6 sid@gmail.com Bengaluru
4 SDE Microsoft Bengaluru 6 sid@gmail.com Bengaluru
2 SDE 2 Amazon Bengaluru 6 sid@gmail.com Bengaluru
10 TPM Amazon Chennai 10 arul@gmail.com Chennai
5 Engineering Manager FunnelIQ Chennai 10 arul@gmail.com Chennai
1 SDE 1 Amazon Chennai 10 arul@gmail.com Chennai
11 SDE2 Grab Singapore
6 SDE Grab Singapore
13 SDE1 Adobe Gurgaon
8 Engineering Lead Adobe Gurgaon
12 TPM Amazon Seattle
3 SDE3 Amazon Seattle

(23 rows)

LEFT OUTER JOIN is not symmetric. If we change the order of the join, the meaning of the query changes.

It becomes, "Pair candidates with matching jobs and list all pairs. If there is no matching job for a candidate, then pair it with an empty row on the job side and list that also"

Or in other words - "Pairs of matching jobs and candidates which have the same city + Candidates in cities without any available jobs"

Again we can see that this is equivalent to adding the result of INNER JOIN with the result of the "NOT IN" subquery where we found the "Candidates in cities without any available jobs". So the total rows in the resultset would be 17 + 7 = 24

SELECT job.id AS job_id, job.designation, job.company, job.city AS job_city, 
candidate.id AS candidate_id, candidate.email AS candidate_email, 
candidate.city AS candidate_city FROM candidate LEFT OUTER JOIN job ON job.city = candidate.city;
job_id designation company job_city candidate_id candidate_email candidate_city
10 TPM Amazon Chennai 1 surya@gmail.com Chennai
5 Engineering Manager FunnelIQ Chennai 1 surya@gmail.com Chennai
1 SDE 1 Amazon Chennai 1 surya@gmail.com Chennai
9 Principal Engineer Grab Bengaluru 2 sankar@gmail.com Bengaluru
7 Product Manager Uber Bengaluru 2 sankar@gmail.com Bengaluru
4 SDE Microsoft Bengaluru 2 sankar@gmail.com Bengaluru
2 SDE 2 Amazon Bengaluru 2 sankar@gmail.com Bengaluru
10 TPM Amazon Chennai 3 alex@gmail.com Chennai
5 Engineering Manager FunnelIQ Chennai 3 alex@gmail.com Chennai
1 SDE 1 Amazon Chennai 3 alex@gmail.com Chennai
4 bob@gmail.com Pune
5 vicky@gmail.com Hyderabad
9 Principal Engineer Grab Bengaluru 6 sid@gmail.com Bengaluru
7 Product Manager Uber Bengaluru 6 sid@gmail.com Bengaluru
4 SDE Microsoft Bengaluru 6 sid@gmail.com Bengaluru
2 SDE 2 Amazon Bengaluru 6 sid@gmail.com Bengaluru
7 vinod@gmail.com Pune
8 kamal@gmail.com Trivandrum
9 ajay@gmail.com Trivandrum
10 TPM Amazon Chennai 10 arul@gmail.com Chennai
5 Engineering Manager FunnelIQ Chennai 10 arul@gmail.com Chennai
1 SDE 1 Amazon Chennai 10 arul@gmail.com Chennai
11 akbar@gmail.com Lucknow
12 manish@gmail.com Indore

(24 rows)

RIGHT OUTER JOIN

Right Outer Join is just the mirror image of LEFT OUTER JOIN. The statement "FROM job RIGHT OUTER JOIN candidate" is exactly equivalent to "FROM candidate LEFT OUTER JOIN job", which is basically the query we discussed just before. So whenever we need to visualize RIGHT OUTER JOIN, we can just instead visualize it as LEFT OUTER JOIN with the order of the join reversed.

SELECT job.id AS job_id, job.designation, job.company, job.city AS job_city, candidate.id AS candidate_id, 
candidate.email AS candidate_email, candidate.city AS candidate_city 
FROM job RIGHT OUTER JOIN candidate ON job.city = candidate.city;
job_id designation company job_city candidate_id candidate_email candidate_city
10 TPM Amazon Chennai 1 surya@gmail.com Chennai
5 Engineering Manager FunnelIQ Chennai 1 surya@gmail.com Chennai
1 SDE 1 Amazon Chennai 1 surya@gmail.com Chennai
9 Principal Engineer Grab Bengaluru 2 sankar@gmail.com Bengaluru
7 Product Manager Uber Bengaluru 2 sankar@gmail.com Bengaluru
4 SDE Microsoft Bengaluru 2 sankar@gmail.com Bengaluru
2 SDE 2 Amazon Bengaluru 2 sankar@gmail.com Bengaluru
10 TPM Amazon Chennai 3 alex@gmail.com Chennai
5 Engineering Manager FunnelIQ Chennai 3 alex@gmail.com Chennai
1 SDE 1 Amazon Chennai 3 alex@gmail.com Chennai
4 bob@gmail.com Pune
5 vicky@gmail.com Hyderabad
9 Principal Engineer Grab Bengaluru 6 sid@gmail.com Bengaluru
7 Product Manager Uber Bengaluru 6 sid@gmail.com Bengaluru
4 SDE Microsoft Bengaluru 6 sid@gmail.com Bengaluru
2 SDE 2 Amazon Bengaluru 6 sid@gmail.com Bengaluru
7 vinod@gmail.com Pune
8 kamal@gmail.com Trivandrum
9 ajay@gmail.com Trivandrum
10 TPM Amazon Chennai 10 arul@gmail.com Chennai
5 Engineering Manager FunnelIQ Chennai 10 arul@gmail.com Chennai
1 SDE 1 Amazon Chennai 10 arul@gmail.com Chennai
11 akbar@gmail.com Lucknow
12 manish@gmail.com Indore

(24 rows)

SELECT job.id AS job_id, job.designation, job.company, job.city AS job_city, 
candidate.id AS candidate_id, candidate.email AS candidate_email, candidate.city AS candidate_city 
FROM candidate RIGHT OUTER JOIN job ON job.city = candidate.city;
job_id designation company job_city candidate_id candidate_email candidate_city
10 TPM Amazon Chennai 1 surya@gmail.com Chennai
5 Engineering Manager FunnelIQ Chennai 1 surya@gmail.com Chennai
1 SDE 1 Amazon Chennai 1 surya@gmail.com Chennai
9 Principal Engineer Grab Bengaluru 2 sankar@gmail.com Bengaluru
7 Product Manager Uber Bengaluru 2 sankar@gmail.com Bengaluru
4 SDE Microsoft Bengaluru 2 sankar@gmail.com Bengaluru
2 SDE 2 Amazon Bengaluru 2 sankar@gmail.com Bengaluru
10 TPM Amazon Chennai 3 alex@gmail.com Chennai
5 Engineering Manager FunnelIQ Chennai 3 alex@gmail.com Chennai
1 SDE 1 Amazon Chennai 3 alex@gmail.com Chennai
9 Principal Engineer Grab Bengaluru 6 sid@gmail.com Bengaluru
7 Product Manager Uber Bengaluru 6 sid@gmail.com Bengaluru
4 SDE Microsoft Bengaluru 6 sid@gmail.com Bengaluru
2 SDE 2 Amazon Bengaluru 6 sid@gmail.com Bengaluru
10 TPM Amazon Chennai 10 arul@gmail.com Chennai
5 Engineering Manager FunnelIQ Chennai 10 arul@gmail.com Chennai
1 SDE 1 Amazon Chennai 10 arul@gmail.com Chennai
11 SDE2 Grab Singapore
6 SDE Grab Singapore
13 SDE1 Adobe Gurgaon
8 Engineering Lead Adobe Gurgaon
12 TPM Amazon Seattle
3 SDE3 Amazon Seattle

(23 rows)

FULL JOIN

Finally we come to FULL JOIN. We can use this when the requirement combines both LEFT and RIGHT outer joins. That is we want a query that will "Pair candidates with matching jobs and list all pairs. If there is no matching job for a candidate, then pair it with an empty row on the job side and list that also. Similarly, ff there is no matching candidate for a job, then pair it with an empty row on the candidate side and list that also"

In other words - "Candidate and job matching pairs, matched on city + Jobs in cities without any available candidates + Candidates in cities without any available jobs"

We can see that this is basically the same as adding the INNER JOIN result to the results from the two "NOT IN" subqueries we wrote. So the final count would be 17 + 7 + 6 = 30

SELECT job.id AS job_id, job.designation, job.company, job.city AS job_city, candidate.id AS candidate_id, 
candidate.email AS candidate_email, candidate.city AS candidate_city 
FROM candidate FULL JOIN job ON job.city = candidate.city;
job_id designation company job_city candidate_id candidate_email candidate_city
10 TPM Amazon Chennai 1 surya@gmail.com Chennai
5 Engineering Manager FunnelIQ Chennai 1 surya@gmail.com Chennai
1 SDE 1 Amazon Chennai 1 surya@gmail.com Chennai
9 Principal Engineer Grab Bengaluru 2 sankar@gmail.com Bengaluru
7 Product Manager Uber Bengaluru 2 sankar@gmail.com Bengaluru
4 SDE Microsoft Bengaluru 2 sankar@gmail.com Bengaluru
2 SDE 2 Amazon Bengaluru 2 sankar@gmail.com Bengaluru
10 TPM Amazon Chennai 3 alex@gmail.com Chennai
5 Engineering Manager FunnelIQ Chennai 3 alex@gmail.com Chennai
1 SDE 1 Amazon Chennai 3 alex@gmail.com Chennai
4 bob@gmail.com Pune
5 vicky@gmail.com Hyderabad
9 Principal Engineer Grab Bengaluru 6 sid@gmail.com Bengaluru
7 Product Manager Uber Bengaluru 6 sid@gmail.com Bengaluru
4 SDE Microsoft Bengaluru 6 sid@gmail.com Bengaluru
2 SDE 2 Amazon Bengaluru 6 sid@gmail.com Bengaluru
7 vinod@gmail.com Pune
8 kamal@gmail.com Trivandrum
9 ajay@gmail.com Trivandrum
10 TPM Amazon Chennai 10 arul@gmail.com Chennai
5 Engineering Manager FunnelIQ Chennai 10 arul@gmail.com Chennai
1 SDE 1 Amazon Chennai 10 arul@gmail.com Chennai
11 akbar@gmail.com Lucknow
12 manish@gmail.com Indore
11 SDE2 Grab Singapore
6 SDE Grab Singapore
13 SDE1 Adobe Gurgaon
8 Engineering Lead Adobe Gurgaon
12 TPM Amazon Seattle
3 SDE3 Amazon Seattle

(30 rows)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment