Skip to content

Instantly share code, notes, and snippets.

@hazelgurule
Last active January 19, 2024 15:05
Show Gist options
  • Save hazelgurule/4d11a41ee222b84e71b57e5f28f9d2c6 to your computer and use it in GitHub Desktop.
Save hazelgurule/4d11a41ee222b84e71b57e5f28f9d2c6 to your computer and use it in GitHub Desktop.
Minimal repro of duckdb/postgres_scanner issue 156

Postgres structure - 3 tables, each has 1 text column named "id"

psql (14.10 (Homebrew))
Type "help" for help.

[LOCAL] mydatabase=# select * from test1;
 id
----
(0 rows)

[LOCAL] mydatabase=# select * from test2;
 id
----
(0 rows)

[LOCAL] mydatabase=# select * from test3;
 id
----
(0 rows)

[LOCAL] mydatabase=# \d test1;
              Table "public.test1"
 Column | Type | Collation | Nullable | Default
--------+------+-----------+----------+---------
 id     | text |           | not null |
Indexes:
    "test1_pkey" PRIMARY KEY, btree (id)

[LOCAL] mydatabase=# \d test2;
              Table "public.test2"
 Column | Type | Collation | Nullable | Default
--------+------+-----------+----------+---------
 id     | text |           | not null |
Indexes:
    "test2_pkey" PRIMARY KEY, btree (id)

[LOCAL] mydatabase=# \d test3;
              Table "public.test3"
 Column | Type | Collation | Nullable | Default
--------+------+-----------+----------+---------
 id     | text |           | not null |
Indexes:
    "test3_pkey" PRIMARY KEY, btree (id)

Duckdb setup:

D force install postgres_scanner from 'http://nightly-extensions.duckdb.org';
D load postgres_scanner;
D ATTACH '<postgres connection string>' as mydatabase (TYPE POSTGRES);

Duckdb is able to query each of them individually:

D select * from mydatabase.test1;
┌─────────┐
│   id    │
│ varchar │
├─────────┤
│ 0 rows  │
└─────────┘
D select * from mydatabase.test2;
┌─────────┐
│   id    │
│ varchar │
├─────────┤
│ 0 rows  │
└─────────┘
D select * from mydatabase.test3;
┌─────────┐
│   id    │
│ varchar │
├─────────┤
│ 0 rows  │
└─────────┘

Duckdb is also able to join any 2 of them together:

D select * from mydatabase.test1 a left join mydatabase.test2 b on a.id = b.id;
┌─────────┬─────────┐
│   id    │   id    │
│ varchar │ varchar │
├───────────────────┤
│      0 rows       │
└───────────────────┘
D select * from mydatabase.test1 a left join mydatabase.test3 c on a.id = c.id;
┌─────────┬─────────┐
│   id    │   id    │
│ varchar │ varchar │
├───────────────────┤
│      0 rows       │
└───────────────────┘
D select * from mydatabase.test2 b left join mydatabase.test3 c on b.id = c.id;
┌─────────┬─────────┐
│   id    │   id    │
│ varchar │ varchar │
├───────────────────┤
│      0 rows       │
└───────────────────┘

However, joining all 3 produces the COPY error:

D select * from mydatabase.test1 a left join mydatabase.test2 b on a.id = b.id left join mydatabase.test3 c on a.id = c.id;
Error: IO Error: Unable to read binary COPY data from Postgres, invalid header

Setting threads=1 resolves it:

D set threads = 1;
D select * from mydatabase.test1 a left join mydatabase.test2 b on a.id = b.id left join mydatabase.test3 c on a.id = c.id;
┌─────────┬─────────┬─────────┐
│   id    │   id    │   id    │
│ varchar │ varchar │ varchar │
├─────────────────────────────┤
│           0 rows            │
└─────────────────────────────┘
D
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment