Skip to content

Instantly share code, notes, and snippets.

@cobusc
Last active July 9, 2020 10:32
Show Gist options
  • Save cobusc/4b582d9a6a48fd9273f2a321d34e8bc9 to your computer and use it in GitHub Desktop.
Save cobusc/4b582d9a6a48fd9273f2a321d34e8bc9 to your computer and use it in GitHub Desktop.
PostgreSQL comparison of IN, ANY, VALUES

Different options

1. For very big sets, create a temporary table with an indexed column and do a SELECT with a JOIN.

2. For large sets, use VALUES. It will create a table like structure which can be JOINed with.

catalogue=> explain select productline_id from productline_tsin_product as ptp, (values (1), (7), (9), (4), (123), (1234)) as lookup where ptp.productline_id = lookup.column1;
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                          QUERY PLAN                                                          │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Nested Loop  (cost=0.56..39.84 rows=187 width=8)                                                                             │
│   ->  Values Scan on "*VALUES*"  (cost=0.00..0.08 rows=6 width=4)                                                            │
│   ->  Index Only Scan using productline_tsin_product_pkey on productline_tsin_product ptp  (cost=0.56..6.32 rows=31 width=8) │
│         Index Cond: (productline_id = "*VALUES*".column1)                                                                    │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(4 rows)

This can be cumbersome to build up though. Psycopg2 has an execute_values() function which makes it easy.

sql = """
SELECT productline_id 
  FROM productline_tsin_product AS ptp, 
       (VALUES %s) AS lookup 
 WHERE ptp.productline_id = lookup.column1
"""

execute_value(cursor, sql, [1, 7, 9, 4, 123, 1234])

3. CTE (WITH-clause)

catalogue=> explain with lookup as (select unnest('{1, 7, 9, 4, 123, 1234}'::int[]) AS id) select productline_id from productline_tsin_product as ptp, lookup  where ptp.productline_id = lookup.id;
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                          QUERY PLAN                                                          │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Nested Loop  (cost=0.56..39.87 rows=187 width=8)                                                                             │
│   ->  ProjectSet  (cost=0.00..0.05 rows=6 width=4)                                                                           │
│         ->  Result  (cost=0.00..0.01 rows=1 width=0)                                                                         │
│   ->  Index Only Scan using productline_tsin_product_pkey on productline_tsin_product ptp  (cost=0.56..6.32 rows=31 width=8) │
│         Index Cond: (productline_id = (unnest('{1,7,9,4,123,1234}'::integer[])))                                             │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

I don't think there is any benefit to this option over the others.

sql = """
WITH lookup AS (
  SELECT UNNEST(%(ids)s) AS id
)
SELECT productline_id 
  FROM productline_tsin_product AS ptp, 
       lookup  
 WHERE ptp.productline_id = lookup.id;

cursor.execute(sql, {"ids": [1, 7, 9, 4, 123, 1234]})

4. IN ()

catalogue=> explain select productline_id from productline_tsin_product where productline_id in (1,7,9,4,123,1234);
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                      QUERY PLAN                                                      │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Index Only Scan using productline_tsin_product_pkey on productline_tsin_product  (cost=0.56..37.91 rows=187 width=8) │
│   Index Cond: (productline_id = ANY ('{1,7,9,4,123,1234}'::bigint[]))                                                │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(2 rows)

There is nothing wrong with this query, but in applications people usually build up the IN clause using something like:

in_clause = ",".join(str(id) for id in [1, 7, 9, 4, 123, 1234])
sql = """
SELECT productline_id 
  FROM productline_tsin_product 
 WHERE productline_id IN ({})
""".format(in_clause)

I really dislike this. See "= ANY()" for an alternative approach.

5. = ANY()

catalogue=> explain select productline_id from productline_tsin_product where productline_id = any('{1, 7, 9, 4, 123, 1234}');
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                      QUERY PLAN                                                      │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Index Only Scan using productline_tsin_product_pkey on productline_tsin_product  (cost=0.56..37.91 rows=187 width=8) │
│   Index Cond: (productline_id = ANY ('{1,7,9,4,123,1234}'::bigint[]))                                                │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(2 rows)

This should be equivalent to "IN ()", but it has the benefit of taking an array, so in your application you can pass the array.

sql = """
SELECT productline_id 
  FROM productline_tsin_product 
 WHERE productline_id = ANY(%(ids)s);
"""

cursor.execute(sql, {"ids": [1, 7, 9, 4, 123, 1234]})
@shane-matuszek-takealot
Copy link

this is interesting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment