Skip to content

Instantly share code, notes, and snippets.

@judell
Created September 18, 2022 10:32
Show Gist options
  • Save judell/89e88610481a65820fd42672c6f52452 to your computer and use it in GitHub Desktop.
Save judell/89e88610481a65820fd42672c6f52452 to your computer and use it in GitHub Desktop.
bulk-download.md

I want to be able to leverage Steampipe's parallelism to fetch many items from Hacker News (or other plugins). I observed that although Hacker News implements both List and Get, only List is called in a JOIN context, so there's no opportunity to parallelize.

I made a plugin that (I think) works the same way as HN in this respect: https://github.com/judell/steampipe-plugin-hello. And I'm looking at different ways to make Steampipe employ the Get function.

This is the query result I am hoping for:

with ids as (
  select 1 as id union select 2 order by id
)
select 
  h.id 
from 
  hello_get_or_list h
join
  ids i
on 
  h.id = i.id
order by
  id
+----+
| id |
+----+
| 1  |
| 2  |
+----+

But it is not the Steampipe result I am looking for. Here is the query plan.

+--------------------------------------------------------------------------------------------------+
| QUERY PLAN                                                                                       |
+--------------------------------------------------------------------------------------------------+
| Nested Loop  (cost=0.06..10000000029950.09 rows=10000 width=8)                                   |
|   Join Filter: (h.id = (1))                                                                      |
|   ->  Foreign Scan on hello_get_or_list h  (cost=0.00..10000000000000.00 rows=1000000 width=100) |
|   ->  Materialize  (cost=0.06..0.10 rows=2 width=4)                                              |
|         ->  Unique  (cost=0.06..0.07 rows=2 width=4)                                             |
|               ->  Sort  (cost=0.06..0.07 rows=2 width=4)                                         |
|                     Sort Key: (1)                                                                |
|                     ->  Append  (cost=0.00..0.05 rows=2 width=4)                                 |
|                           ->  Result  (cost=0.00..0.01 rows=1 width=4)                           |
|                           ->  Result  (cost=0.00..0.01 rows=1 width=4)                           |
+--------------------------------------------------------------------------------------------------+

Checking the log I see that List was called, but I am hoping to use Get.

select pg_typeof( (1,2) )
+-----------+
| pg_typeof |
+-----------+
| record    |
+-----------+

explain select id from hello_get_or_list where id in (1,2)
+------------------------------------------------------------------------------------------+
| QUERY PLAN                                                                               |
+------------------------------------------------------------------------------------------+
| Foreign Scan on hello_get_or_list  (cost=0.00..40000000000000.00 rows=1000000 width=400) |
|   Filter: (id = ANY ('{1,2}'::bigint[]))                                                 |
+------------------------------------------------------------------------------------------+

select id from hello_get_or_list where id in (1,2)

+----+
| id |
+----+
| 1  |
| 2  |
+----+

Another way to do it.

select pg_typeof ( array[1,2] )
+-----------+
| pg_typeof |
+-----------+
| integer[] |
+-----------+

explain select id from hello_get_or_list where id = any (array[1,2] )
+------------------------------------------------------------------------------------------+
| QUERY PLAN                                                                               |
+------------------------------------------------------------------------------------------+
| Foreign Scan on hello_get_or_list  (cost=0.00..40000000000000.00 rows=1000000 width=400) |
|   Filter: (id = ANY ('{1,2}'::bigint[]))                                                 |
+------------------------------------------------------------------------------------------+

select id from hello_get_or_list where id = any (array[1,2] )

+----+
| id |
+----+
| 1  |
| 2  |
+----+

In both these cases, Steampipe calls Get. I'm unable to get Steampipe to call Get in a JOIN context, so I'm trying to make Postgres use the same query plan in a way that provides the data from a select. Seems like there MUST be a way but I cannot find it.

select pg_typeof( array ( select 1 union select 2 ) )
+-----------+
| pg_typeof |
+-----------+
| integer[] |
+-----------+

explain select id from hello_get_or_list where id = any ( array ( select 1 union select 2 )  )
+------------------------------------------------------------------------------------------+
| QUERY PLAN                                                                               |
+------------------------------------------------------------------------------------------+
| Foreign Scan on hello_get_or_list  (cost=0.07..40000000000000.07 rows=1000000 width=400) |
|   Filter: (id = ANY ($0))                                                                |
|   InitPlan 1 (returns $0)                                                                |
|     ->  Unique  (cost=0.06..0.07 rows=2 width=8)                                         |
|           ->  Sort  (cost=0.06..0.07 rows=2 width=8)                                     |
|                 Sort Key: ('1'::bigint)                                                  |
|                 ->  Append  (cost=0.00..0.05 rows=2 width=8)                             |
|                       ->  Result  (cost=0.00..0.01 rows=1 width=8)                       |
|                       ->  Result  (cost=0.00..0.01 rows=1 width=8)                       |
+------------------------------------------------------------------------------------------+

select id from hello_get_or_list where id = any ( array (select 1 union select 2 ) )
+----+
| id |
+----+
+----+
select pg_typeof(
  (
    with ids as (
      select 1 as id union select 2
    )
    select array_agg(id) from ids
  )
)
+-----------+
| pg_typeof |
+-----------+
| integer[] |
+-----------+

explain select id from hello_get_or_list where id in (
  (
    with ids as (
      select 1 as id union select 2
    )
    select array_agg(id) from ids
  )
)
Error: operator does not exist: bigint = integer[] (SQLSTATE 42883)
select pg_typeof(
  (
    select array ( select 1 union select 2 ) 
  )
)
+-----------+
| pg_typeof |
+-----------+
| integer[] |
+-----------+

explain select id from hello_get_or_list where id = any (
  (
    select array ( select 1 union select 2 ) 
  )
)

Error: operator does not exist: bigint = integer[] (SQLSTATE 42883)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment