Skip to content

Instantly share code, notes, and snippets.

@ckashby
Created June 10, 2024 01:32
Show Gist options
  • Save ckashby/4d121939bd106e120515a462d968cefa to your computer and use it in GitHub Desktop.
Save ckashby/4d121939bd106e120515a462d968cefa to your computer and use it in GitHub Desktop.
# ## Aggregating Statistics with GROUP BY
# 45) What is the average age of `pets` by species?
pd.read_sql("SELECT AVG(age) FROM pets GROUP BY species;", conn)
# dogs 6.5, cat 4.3, lobster 3
# 46) Repeat the previous problem but make sure the species label is also displayed!
pd.read_sql("SELECT species, AVG(age) FROM pets GROUP BY species;", conn)
# Assume this behavior is always being asked of you any time you use `GROUP BY`.
# 47) What is the count, mean, minimum, and maximum age by species in `pets`?
pd.read_sql("SELECT species, COUNT(age), AVG(age), MIN(age), MAX(age) FROM pets GROUP BY species;", conn)
# 48) Show the mean salaries of each job title in `employees`.
pd.read_sql("SELECT job, AVG(salary) FROM employees GROUP BY job;", conn)
# 49) Show the mean salaries in New Zealand dollars of each job title in `employees`.
# * _NOTE:_ 1 USD = 1.65 NZD.
pd.read_sql("SELECT job, AVG(salary) * 1.65 AS 'New Zealand Dollars' FROM employees GROUP BY job;", conn)
# 50) Show the mean, min, and max salaries of each job title in `employees`, as well as the numbers of employees in each category.
pd.read_sql("SELECT job, AVG(salary), MIN(salary), MAX(salary), COUNT(*) FROM employees GROUP BY job;", conn)
# 51) Show the mean salaries of each job title in `employees` sorted descending by salary.
pd.read_sql("SELECT job, AVG(salary) FROM employees GROUP BY job ORDER BY AVG(salary) DESC;", conn)
# 52) What are the top 5 most common first names among `employees`?
pd.read_sql("SELECT firstname, COUNT(*) FROM employees GROUP BY firstname ORDER BY COUNT(*) DESC LIMIT 5;", conn)
# 53) Show all first names which have exactly 2 occurrences in `employees`.
pd.read_sql("SELECT firstname, COUNT(*) FROM employees GROUP BY firstname HAVING COUNT(*) = 2;", conn)
# 54) Take a look at the `transactions` table to get a idea of what it contains.
# Note that a transaction may span multiple rows if different items are purchased as part of the same order.
# The employee who made the order is also given by their ID.
pd.read_sql("SELECT * FROM transactions;", conn)
# 55) Show the top 5 largest orders (and their respective customer) in terms of the numbers of items purchased in that order.
pd.read_sql("SELECT customer, SUM(quantity) FROM transactions GROUP BY customer ORDER BY SUM(quantity) DESC LIMIT 5;", conn)
# 56) Show the total cost of each transaction.
# * _Hint:_ The `unit_price` column is the price of _one_ item. The customer may have purchased multiple.
# * _Hint2:_ Note that transactions here span multiple rows if different items are purchased.
pd.read_sql("SELECT order_id, SUM(quantity * unit_price) FROM transactions GROUP BY order_id;", conn)
# 57) Show the top 5 transactions in terms of total cost.
pd.read_sql("SELECT order_id, SUM(quantity * unit_price) FROM transactions GROUP BY order_id ORDER BY SUM(quantity * unit_price) DESC LIMIT(5);", conn)
# 58) Show the top 5 customers in terms of total revenue (ie, which customers have we done the most business with in terms of money?)
pd.read_sql("SELECT customer, SUM(quantity * unit_price) FROM transactions GROUP BY customer ORDER BY SUM(quantity * unit_price) DESC LIMIT 5;", conn)
# 59) Show the top 5 employees in terms of revenue generated (ie, which employees made the most in sales?)
pd.read_sql("SELECT employee_id, SUM(quantity * unit_price) FROM transactions GROUP BY employee_id ORDER BY SUM(quantity * unit_price) DESC LIMIT 5;", conn)
# 60) Which customer worked with the largest number of employees?
# * _Hint:_ This is a tough one! Check out the `DISTINCT` keyword.
pd.read_sql("SELECT customer, COUNT(DISTINCT employee_id) FROM transactions GROUP BY customer ORDER BY COUNT(DISTINCT employee_id) DESC LIMIT 1;", conn)
# 61) Show all customers who've done more than $80,000 worth of business with us.
pd.read_sql("SELECT customer, SUM(quantity * unit_price) FROM transactions GROUP BY customer HAVING SUM(quantity * unit_price) > 80000;", conn)
# pd.read_sql("SELECT * FROM transactions;", conn)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment