Created
January 20, 2019 22:57
-
-
Save OlegGorj/96e093e4f1276eb69e0b3af8f56e530a to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
PostgreSQL has Foreign Data Wrappers. A very powerful feature. Greenplum | |
has External Tables. Also a very powerful feature. In fact, it's pretty | |
much the same feature. As we catch up with PostgreSQL, we need to figure | |
out what we're going to do about that. In the long run, we clearly don't | |
want to maintain two features that do the same thing, but with different | |
syntax and different set of wrappers/plugins. | |
Goal | |
---- | |
* Fully support all community FDWs in Greenplum | |
* Refactor existing external protocols to be FDWs, to reduce GPDB code | |
footprint and to support future FDW features. | |
* Keep external table syntax, for backwards-compatibility | |
History of PostgreSQL Foreign Data Wrappers | |
------------------------------------------- | |
The FDW feature has been improved in PostgreSQL over several releases: | |
8.4: DDL Syntax, pg_dump/restore | |
9.0: - | |
9.1: Basic read support | |
9.2: Statistics collection (ANALYZE) | |
9.3: Updatable foreign tables | |
9.4: - | |
9.5: Join pushdown. IMPORT FOREIGN SCHEMA command. | |
9.6: Advanced pushdown support (sorts, aggregates, updates, etc.) | |
For the same functionality we have with readable external tables, we | |
need to catch up with 9.1. For writeable external tables, we need 9.3. | |
Handler API differences | |
----------------------- | |
The PostgreSQL API consists of a number of callback functions. There are | |
callbacks that get called at plan time, at ANALYZE, at EXPLAIN, at | |
execution, and so forth. | |
The main callbacks, for scanning a foreign table, are: | |
BeginForeignScan | |
IterateForeignScan | |
EndForeignScan | |
These are called at execution. IterateForeignScan returns one tuple at a | |
time, and the executor calls it repeatedly, until there are no more | |
tuples left. This differs somewhat from the external protocol API. The | |
corresponding external protocol API consists of a callback function, | |
"readfunc". The executor calls it repeatedly, until there is no more | |
data left. The difference is that while the FDW IterateForeignScan | |
callback returns one tuple at a time, in a TupleTableSlot, "readfunc" | |
returns a chunk of raw data, which is expected to be in the same format | |
as used by COPY. The executor feeds the raw data to the COPY parsing | |
code, to get the tuples. | |
The updateable FDW and Writeable External Table APIs differ in a similar | |
way. The FDW API is tuple-oriented, while the External Table API writes | |
chunks of COPY-formatted raw data. | |
See https://www.postgresql.org/docs/current/static/fdw-callbacks.html | |
for details of the FDW API. (I'm not aware of a similar document for the | |
external protocol API. Anyone?) | |
The Master Plan | |
--------------- | |
0. Catch-up/cherry-pick FDW syntax from PostgreSQL 8.4. | |
1. Catch-up/cherry-pick FDW support from PostgreSQL 9.3 | |
2. Add glue code to make it possible to use the FDW API, with the | |
External Table syntax. | |
3. Refactor existing external protocols to use the FDW API. | |
4. Remove external scan and update code. | |
Alternatively, instead of refitting all existing protocols to the FDW | |
API, we could write a special adapter FDW that converts a stream of | |
COPY-formatted data into tuples, and vice versa. The adapter FDW | |
wouldn't connect to any particular data source, but it would call yet | |
another plugin to get the data. The API for those plugins would be the | |
same as the current external protocol API. That's appealing, but I think | |
it's easier to just refactor all the existing external protocols. There | |
aren't that many of them, and dealing with yet another adapter layer | |
seems more complicated in the long run. There will be some duplicated | |
code, to parse all the COPY options, like whether it's in CSV format, | |
delimiters, etc. But it's not too bad. The file_fdw in PostgreSQL does | |
the same, and file_fdw.c is about 1000 lines of C code. The set of COPY | |
options has been unchanged for a very long time, so I'm not too worried | |
about new options being added, that would then need to be copied to all | |
the protocols. | |
MPP | |
--- | |
There is one crucial difference between PostgreSQL and Greenplum: | |
Greenplum is an MPP system. That's reflected in the external protocols, | |
but community FDWs know nothing about that. What exactly should happen, | |
when you do "SELECT * FROM foreign_table"? Is the foreign scan started | |
on every segment? Or is it performed fully in the master? Or both? If | |
the foreign scan is started on all segments, how does the FDW on each | |
segment know which slice of data to fetch? | |
That's handled slightly differently in different external protocols, | |
currently: | |
* With gpfdist, the segments pass along the segment-ID in the HTTP | |
requests they make to the | |
gpfdist server. The gpfdist server slices the data, based on the | |
segment-ID. | |
* With S3, each segment fetches only a subset of the files in the | |
bucket, based on its | |
segment-ID. | |
* gphdfs works similarly: each segment fetches a subset of the of the | |
data, based on its | |
segment-ID. | |
* With EXECUTE-type external tables, the command is executed on every | |
segment. A bunch of | |
environment variables are set, including GP_SEGMENT_ID and | |
GP_SEGMENT_COUNT, which you | |
can use in your script to split the data, if needed. | |
So, how will we handle this with Foreign Data Wrappers? There's probably | |
no one-size-fits-all solution. For existing external prtocols, converted | |
to FDWs, we should keep the behaviour the same. But for FDWs written for | |
PostgreSQL, that know nothing about MPP, we shouldn't just blindly run | |
the foreign scan on every segment. For example, with a Twitter FDW, | |
written for PostgreSQL, running the foreign scan on every segment would | |
fetch the same tweets on every segment, and you'd get duplicated rows. | |
I think the only safe option here is to insist that the FDW is executed | |
only on the master, unless the FDW somehow indicates that it knows about | |
MPP. Let's extend the FDW API with a new MPP-aware flag or callback. If | |
the FDW set the flag, then the foreign scan is executed on all segments. | |
Otherwise it's executed only on master. | |
We might also want an override option for that, so that the user can | |
force an FDW to be executed on every segment anyway. That might be | |
useful, e.g. with the file_fdw, if you have different files spread out | |
to the segments. | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment