Skip to content

Instantly share code, notes, and snippets.

@dannguyen
Last active October 24, 2017 21:14
Show Gist options
  • Save dannguyen/9ba20f9bdace87f103db0ba9c829a32b to your computer and use it in GitHub Desktop.
Save dannguyen/9ba20f9bdace87f103db0ba9c829a32b to your computer and use it in GitHub Desktop.
quickie writeup for padjo-2017 on why SQL and inner oins

SQL and the bigger picture of joins

Revisiting this 2014 database of Congressional member data and Twitter profile data:

Twitter and Congress are two different worlds/information systems:

Someone had to manually look up twitter screen names and Congressmembers bioguide_id numbers in a lookup table the (social_accounts) table.

Simple questions are time consuming to answer:

  • Which congressmembers are on Twitter
  • Which congressmembers have the most followers

Here's how SQL helps.

The members table

SELECT 
  members.current_role,
  members.party,
  members.state,
  members.first_name,
  members.last_name

FROM members;

A "lookup" table of Congress "bioguide IDs"

With known social media accounts

SELECT *
FROM social_accounts;

Data on Twitter profiles, from Twitter

SELECT *
FROM twitter_profiles;

The JOIN

SELECT 
  members.current_role,
  members.party,
  members.state,
  members.first_name,
  members.last_name

FROM members
INNER JOIN social_accounts
  ON members.bioguide_id = social_accounts.bioguide_id;
  • need to explicitly point out the column we want to see
  • Use aliases to reduce visual clutter
SELECT 
  s.twitter_screen_name,
  m.current_role,
  m.party,
  m.state,
  m.first_name,
  m.last_name

FROM members AS m
INNER JOIN social_accounts AS s
  ON m.bioguide_id = s.bioguide_id;

Triple join!

SELECT 
  s.twitter_screen_name,
  m.current_role,
  m.party,
  m.state,
  m.first_name,
  m.last_name

FROM members AS m
INNER JOIN social_accounts AS s
  ON m.bioguide_id = s.bioguide_id
INNER JOIN twitter_profiles AS tp
  ON tp.screen_name = s.twitter_Screen_name;

Add twitter metrics:

SELECT 
  s.twitter_screen_name,
  tp.followers_count,
  tp.statuses_count,
  m.current_role,
  m.party,
  m.state,
  m.first_name,
  m.last_name
FROM members AS m
INNER JOIN social_accounts AS s
  ON m.bioguide_id = s.bioguide_id
INNER JOIN twitter_profiles AS tp
  ON tp.screen_name = s.twitter_Screen_name
ORDER BY tp.followers_count DESC;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment