Skip to content

Instantly share code, notes, and snippets.

@stefnestor
Created May 7, 2023 16:29
Show Gist options
  • Save stefnestor/7b96b56c00d60349ec159ab4f3622699 to your computer and use it in GitHub Desktop.
Save stefnestor/7b96b56c00d60349ec159ab4f3622699 to your computer and use it in GitHub Desktop.

With ...

> SELECT id, color, name FROM shirts LIMIT 5

id   color     name
1    blue      Jane
2    yellow    Susan
3    green     Craig
4    yellow    Susan
5    blue      Susan

... form a SQL statement which would find

  1. the number of shirts of each color
  2. how many people (names) have color: blue shirts
  3. how many people (names) have more than one color of shirts
(examples)

Answers may vary but sufficient examples are ...

  1. > SELECT color, count(id) FROM shirts GROUP BY color
    
  2. > SELECT DISTINCT(name) FROM shirts WHERE color='blue'
    
  3. > SELECT name FROM (SELECT name, COUNT(DISTINCT(color)) AS color_count FROM shirts GROUP BY name) WHERE color_count>1
    

... with an easy replication test via ...

$ pip install duckdb pandas
$ ipython
Python 3.11.2 (main, Feb 16 2023, 03:07:35) [Clang 14.0.0 (clang-1400.0.29.202)]
IPython 8.12.0 -- An enhanced Interactive Python. Type '?' for help.

# setup
In [1]: import duckdb
In [2]: import pandas as pd
In [3]: shirts = pd.DataFrame([{"id":1, "color":"blue", "name":"Jane"},{"id":2, "color":"yellow", "name":"Susan"},{"id":3, "color":"green", "name":"Craig"},{"id":4, "color":"yellow", "name":"Susan"},{"id":5, "color":"blue", "name": "Susan"}])

# loaded table
In [4]: print(duckdb.sql("SELECT * FROM shirts LIMIT 5").df())
   id   color   name
0   1    blue   Jane
1   2  yellow  Susan
2   3   green  Craig
3   4  yellow  Susan
4   5    blue  Susan

# answers
# 👉 (1)
In [5]: print(duckdb.sql("SELECT color, count(id) FROM shirts GROUP BY color").df())
    color  count(id)
0    blue          2
1  yellow          2
2   green          1

# 👉 (2)
In [12]: print(duckdb.sql("SELECT DISTINCT(name) FROM shirts WHERE color='blue'").df())
    name
0   Jane
1  Susan

# 👉 (3)
In [21]: print(duckdb.sql("SELECT name FROM (SELECT name, COUNT(DISTINCT(color)) AS color_count FROM shirts GROUP BY name) WHERE color_count>1").df())
    name
0  Susan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment