Skip to content

Instantly share code, notes, and snippets.

@danman01
Last active August 23, 2018 14:00
Show Gist options
  • Save danman01/adafdfceb7743c8365e18eb90bbdc965 to your computer and use it in GitHub Desktop.
Save danman01/adafdfceb7743c8365e18eb90bbdc965 to your computer and use it in GitHub Desktop.

SQL Notes

In this workshop, we will be learning and using SQL to access data in a relational database

Objectives:

  • Explain what a database is
  • Compare and contrast relational db vs NoSQL db
  • Write SQL commands to get data out of a database
  • Write SQL to put data into a database

Prerequisites

  • familiarity with computing basics
  • none

Installation and Preparation

  • Install pgAdmin v3: https://www.pgadmin.org/download/

  • Load this SQL file into pgAdmin to start filling in notes on our queries: www.ga.co/kirschnersql

  • When you have PgAdmin downloaded:

  • Please connect to the region specific server:

  • host: analyticsga-east2.c20gkj5cvu3l.us-east-1.rds.amazonaws.com

  • port: 5432

  • username: analytics_student

  • password: analyticsga

What is a database

  • A file to store data that allows for efficient read and write operations to the file
    • Stored on disk or in memory
    • Intended for use by applications with large amounts of data and complex relationships
    • Handles reading and writing using Structured Query Language (SQL)
    • Why not excel? CSV? Notepad?

How do you access the data?

  • SQL (Structured Query Language)
    • ex: SELECT * FROM 'Developers';
    • Where do you run SQL?
    • What do you get back?
    • How do you use these results?

Who uses SQL?

  • Back-end developers
  • Data scientists
  • Data analysts
  • Data-driven marketers
  • Data-driven account managers

Relational database vs NoSQL

Planning your database

Planning the database structure is arguably the most important skill associated with created and maintaining databases. Once you have a sound plan for the various tables you will need, the columns and data types that the tables will hold, and how tables are related, you can then proceed to create your database structure and finally add data to it.

Here is the simple imdb Entity Relationship Diagram (ERD) that we viewed during the workshop.

SQL Queries that we ran

Completed Queries:

--Hi Everyone!! I hope you're excited to learn about SQL!


--Get familiar with the data
SELECT * FROM products LIMIT 100;
SELECT * FROM sales LIMIT 100;
SELECT * FROM stores LIMIT 100;
SELECT * FROM counties LIMIT 100;


--LET'S START WITH THE BASCIS

--Let's try to get unique category names.
SELECT DISTINCT category_name FROM products;

--Let's get the unique category names where category_name is not null
SELECT DISTINCT category_name 
FROM products 
WHERE category_name is not null;



--FILTERING--WE DO

--Which items come in packs larger than 12?
SELECT item_no, item_description, pack
FROM products
WHERE pack > 12
ORDER BY pack;



--Which items have a case cost of less than $70?
SELECT item_no, item_description, case_cost
FROM products
WHERE case_cost < 70
ORDER BY case_cost DESC;

--Which items come in packs larger than 12 AND have a case_cost of less than $70?
SELECT item_no, item_description, pack, case_cost
FROM products
WHERE pack > 12
AND case_cost < 70
ORDER BY case_cost DESC, pack;


--Which categories have a proof of 85 or more?
SELECT category_name, proof::INTEGER
FROM products
WHERE proof::INTEGER >= 85
ORDER BY proof::INTEGER DESC;

SELECT *
FROM products
LIMIT 100;

--Which items are in the scotch whiskies category OR are over 65 proof?
SELECT item_no, item_description, category_name, proof::INTEGER
FROM products
WHERE category_name ILIKE '%scotch whisk%';
--OR proof::INTEGER > 65;


--FILTERING--YOU DO

--Which items are over 90 proof?
SELECT item_no, item_description, proof::INTEGER
FROM products
WHERE proof::INTEGER > 90
ORDER BY proof::INTEGER;--765


SELECT item_description, cast(proof as integer)
FROM products
WHERE cast(proof as integer) > 90
ORDER BY cast(proof as integer);

--Which items have a case cost of less than $60?
SELECT item_no, item_description, case_cost
FROM products
WHERE case_cost < 60
ORDER BY case_cost DESC;--2826


--Which items are in either Single Malt Scotches or Canadian Whiskies categories?
SELECT category_name, item_description, item_no
FROM products
WHERE category_name ILIKE '%SINGLE MALT SCOTCH%'
OR category_name ILIKE '%CANADIAN%'
ORDER BY category_name;

SELECT DISTINCT category_name
FROM products
WHERE category_name ILIKE '%CANADIAN%';


--Which items are in the whiskey (or whiskies) category? Think back to LIKE ‘%SOMETHING%’.
SELECT item_no, item_description, category_name
FROM products
WHERE category_name LIKE '%WHISK%';--1606

SELECT DISTINCT category_name, pack
FROM products
WHERE category_name LIKE '%WHISK%';



--What is the most expensive purchase of Svedka? Hint: look at the sales table- description and total columns. 
SELECT *
FROM sales
LIMIT 100;

SELECT description, total
FROM sales
WHERE description LIKE '%Svedka%'
ORDER BY total DESC
LIMIT 1;



----------------------------------------------

SELECT COUNT(*)
FROM sales;


--GROUP BY & HAVING

--How would I find out how many items are available per vendor name?
SELECT * FROM products LIMIT 10;

SELECT vendor_name, COUNT(item_no)
FROM products
GROUP BY vendor_name
ORDER BY COUNT(item_no) DESC;

SELECT vendor_name, COUNT(item_no)
FROM products
GROUP BY vendor_name
ORDER BY 2 DESC;

--What if we wanted to know which categories have an average proof higher than 75?
SELECT category_name, avg(proof::INTEGER)
FROM products
GROUP BY category_name
HAVING avg(proof::INTEGER) > 75;

--CAST SYNTAX
select category_name, cast(proof as integer)
from products
limit 5;

--How many items does each category have? 
SELECT category_name AS category, count(item_no) AS bananas
FROM products
GROUP BY category
ORDER BY count(item_no) DESC;

SELECT category_name, count(item_no)
FROM products
GROUP BY category_name
ORDER BY count(item_no) DESC;

--Which categories have more than 100 items? Make sure to have a column with the count of items in each category too.
SELECT category_name, count(item_no)
FROM products
WHERE category_name is not null
GROUP BY category_name
HAVING count(item_no) > 100;

--What is the average bottle size per category of whiskey?
SELECT category_name, round(avg(bottle_size),2)
FROM products
WHERE category_name ILIKE '%whisk%'
GROUP BY category_name;

select * from products limit 10;

--What is the average state_btl_cost per vendor? Hint: use CAST function to convert money to numeric, use HAVING clause in sales table
SELECT vendor, round(avg(state_btl_cost::NUMERIC),2)
FROM sales
GROUP BY vendor
HAVING round(avg(state_btl_cost::NUMERIC),2) > 50
ORDER BY round(avg(state_btl_cost::NUMERIC),2) DESC
LIMIT 100;


--MATHEMATICAL OPERATIONSs


--The per bottle price for each item:
SELECT * FROM sales LIMIT 5;

SELECT description, total / bottle_qty AS PerBottlePrice
FROM sales
LIMIT 100;

--Add column named “difference” that calculates the difference between shelf price and bottle price for each product. 
SELECT * FROM products LIMIT 5;

SELECT *, shelf_price - cast(bottle_price as numeric) AS difference
FROM products
LIMIT 100;


--JOINS--WE DO


--Let's find the top 5 store sales with its store name.
-- stores table: store (integer), name (text)
-- sales table: store (integer), total
SELECT * FROM stores LIMIT 5;
SELECT * FROM sales LIMIT 5;


SELECT stores.name, sum(sales.total)
FROM sales
LEFT JOIN stores
ON sales.store = stores.store
GROUP BY stores.name
ORDER BY sum(sales.total) DESC
LIMIT 5;

SELECT b.name, sum(a.total)
FROM sales as a
LEFT JOIN stores as b
ON a.store = b.store
GROUP BY b.name
ORDER BY sum(a.total) DESC
LIMIT 5;

--CASE STATEMENTS--WE DO
--CASE syntax
SELECT something,
	CASE
		WHEN condition THEN value
		WHEN condition THEN value
		[...]
		[ELSE value]
	END AS casecolname
FROM table;

--We want to classify the counties of Iowa as either small, medium, or large. 
--For the purposes of this example, say large is over 200,000
--medium is between 40,000 and 200,000
--small is lower than 40,000
SELECT * FROM counties LIMIT 5;

SELECT county, population,
	CASE
		WHEN population > 200000 THEN 'large'
		WHEN population > 40000 THEN 'medium'
		ELSE 'small'
	END AS county_size
FROM counties;


--Subqueries--try to follow along with the instructor
SELECT county_size, count(county)
FROM (SELECT county, population,
	CASE
		WHEN population > 200000 THEN 'large'
		WHEN population > 40000 THEN 'medium'
		ELSE 'small'
	END AS county_size
FROM counties) AS temp
GROUP BY county_size
ORDER BY county_size DESC;

Additional Resources

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment