OlegGorj/Greenplum Foreign Data Wrappers, External Tables

## Greenplum Foreign Data Wrappers, External Tables
PostgreSQL has Foreign Data Wrappers. A very powerful feature. Greenplum
has External Tables. Also a very powerful feature. In fact, it's pretty
much the same feature. As we catch up with PostgreSQL, we need to figure
out what we're going to do about that. In the long run, we clearly don't
want to maintain two features that do the same thing, but with different
syntax and different set of wrappers/plugins.


Goal
----

* Fully support all community FDWs in Greenplum

* Refactor existing external protocols to be FDWs, to reduce GPDB code
footprint and to support future FDW features.

* Keep external table syntax, for backwards-compatibility


History of PostgreSQL Foreign Data Wrappers
-------------------------------------------

The FDW feature has been improved in PostgreSQL over several releases:

8.4: DDL Syntax, pg_dump/restore
9.0: -
9.1: Basic read support
9.2: Statistics collection (ANALYZE)
9.3: Updatable foreign tables
9.4: -
9.5: Join pushdown. IMPORT FOREIGN SCHEMA command.
9.6: Advanced pushdown support (sorts, aggregates, updates, etc.)

For the same functionality we have with readable external tables, we
need to catch up with 9.1. For writeable external tables, we need 9.3.


Handler API differences
-----------------------

The PostgreSQL API consists of a number of callback functions. There are
callbacks that get called at plan time, at ANALYZE, at EXPLAIN, at
execution, and so forth.

The main callbacks, for scanning a foreign table, are:

BeginForeignScan
IterateForeignScan
EndForeignScan

These are called at execution. IterateForeignScan returns one tuple at a
time, and the executor calls it repeatedly, until there are no more
tuples left. This differs somewhat from the external protocol API. The
corresponding external protocol API consists of a callback function,
"readfunc". The executor calls it repeatedly, until there is no more
data left. The difference is that while the FDW IterateForeignScan
callback returns one tuple at a time, in a TupleTableSlot, "readfunc"
returns a chunk of raw data, which is expected to be in the same format
as used by COPY. The executor feeds the raw data to the COPY parsing
code, to get the tuples.

The updateable FDW and Writeable External Table APIs differ in a similar
way. The FDW API is tuple-oriented, while the External Table API writes
chunks of COPY-formatted raw data.

See https://www.postgresql.org/docs/current/static/fdw-callbacks.html
for details of the FDW API. (I'm not aware of a similar document for the
external protocol API. Anyone?)


The Master Plan
---------------

0. Catch-up/cherry-pick FDW syntax from PostgreSQL 8.4.

1. Catch-up/cherry-pick FDW support from PostgreSQL 9.3
2. Add glue code to make it possible to use the FDW API, with the
External Table syntax.
3. Refactor existing external protocols to use the FDW API.
4. Remove external scan and update code.


Alternatively, instead of refitting all existing protocols to the FDW
API, we could write a special adapter FDW that converts a stream of
COPY-formatted data into tuples, and vice versa. The adapter FDW
wouldn't connect to any particular data source, but it would call yet
another plugin to get the data. The API for those plugins would be the
same as the current external protocol API. That's appealing, but I think
it's easier to just refactor all the existing external protocols. There
aren't that many of them, and dealing with yet another adapter layer
seems more complicated in the long run. There will be some duplicated
code, to parse all the COPY options, like whether it's in CSV format,
delimiters, etc. But it's not too bad. The file_fdw in PostgreSQL does
the same, and file_fdw.c is about 1000 lines of C code. The set of COPY
options has been unchanged for a very long time, so I'm not too worried
about new options being added, that would then need to be copied to all
the protocols.


MPP
---

There is one crucial difference between PostgreSQL and Greenplum:
Greenplum is an MPP system. That's reflected in the external protocols,
but community FDWs know nothing about that. What exactly should happen,
when you do "SELECT * FROM foreign_table"? Is the foreign scan started
on every segment? Or is it performed fully in the master? Or both? If
the foreign scan is started on all segments, how does the FDW on each
segment know which slice of data to fetch?

That's handled slightly differently in different external protocols,
currently:

* With gpfdist, the segments pass along the segment-ID in the HTTP
requests they make to the
   gpfdist server. The gpfdist server slices the data, based on the
segment-ID.

* With S3, each segment fetches only a subset of the files in the
bucket, based on its
   segment-ID.

* gphdfs works similarly: each segment fetches a subset of the of the
data, based on its
   segment-ID.

* With EXECUTE-type external tables, the command is executed on every
segment. A bunch of
   environment variables are set, including GP_SEGMENT_ID and
GP_SEGMENT_COUNT, which you
   can use in your script to split the data, if needed.

So, how will we handle this with Foreign Data Wrappers? There's probably
no one-size-fits-all solution. For existing external prtocols, converted
to FDWs, we should keep the behaviour the same. But for FDWs written for
PostgreSQL, that know nothing about MPP, we shouldn't just blindly run
the foreign scan on every segment. For example, with a Twitter FDW,
written for PostgreSQL, running the foreign scan on every segment would
fetch the same tweets on every segment, and you'd get duplicated rows.

I think the only safe option here is to insist that the FDW is executed
only on the master, unless the FDW somehow indicates that it knows about
MPP. Let's extend the FDW API with a new MPP-aware flag or callback. If
the FDW set the flag, then the foreign scan is executed on all segments.
Otherwise it's executed only on master.

We might also want an override option for that, so that the user can
force an FDW to be executed on every segment anyway. That might be
useful, e.g. with the file_fdw, if you have different files spread out
to the segments.
	PostgreSQL has Foreign Data Wrappers. A very powerful feature. Greenplum
	has External Tables. Also a very powerful feature. In fact, it's pretty
	much the same feature. As we catch up with PostgreSQL, we need to figure
	out what we're going to do about that. In the long run, we clearly don't
	want to maintain two features that do the same thing, but with different
	syntax and different set of wrappers/plugins.


	Goal
	----

	* Fully support all community FDWs in Greenplum

	* Refactor existing external protocols to be FDWs, to reduce GPDB code
	footprint and to support future FDW features.

	* Keep external table syntax, for backwards-compatibility


	History of PostgreSQL Foreign Data Wrappers
	-------------------------------------------

	The FDW feature has been improved in PostgreSQL over several releases:

	8.4: DDL Syntax, pg_dump/restore
	9.0: -
	9.1: Basic read support
	9.2: Statistics collection (ANALYZE)
	9.3: Updatable foreign tables
	9.4: -
	9.5: Join pushdown. IMPORT FOREIGN SCHEMA command.
	9.6: Advanced pushdown support (sorts, aggregates, updates, etc.)

	For the same functionality we have with readable external tables, we
	need to catch up with 9.1. For writeable external tables, we need 9.3.


	Handler API differences
	-----------------------

	The PostgreSQL API consists of a number of callback functions. There are
	callbacks that get called at plan time, at ANALYZE, at EXPLAIN, at
	execution, and so forth.

	The main callbacks, for scanning a foreign table, are:

	BeginForeignScan
	IterateForeignScan
	EndForeignScan

	These are called at execution. IterateForeignScan returns one tuple at a
	time, and the executor calls it repeatedly, until there are no more
	tuples left. This differs somewhat from the external protocol API. The
	corresponding external protocol API consists of a callback function,
	"readfunc". The executor calls it repeatedly, until there is no more
	data left. The difference is that while the FDW IterateForeignScan
	callback returns one tuple at a time, in a TupleTableSlot, "readfunc"
	returns a chunk of raw data, which is expected to be in the same format
	as used by COPY. The executor feeds the raw data to the COPY parsing
	code, to get the tuples.

	The updateable FDW and Writeable External Table APIs differ in a similar
	way. The FDW API is tuple-oriented, while the External Table API writes
	chunks of COPY-formatted raw data.

	See https://www.postgresql.org/docs/current/static/fdw-callbacks.html
	for details of the FDW API. (I'm not aware of a similar document for the
	external protocol API. Anyone?)


	The Master Plan
	---------------

	0. Catch-up/cherry-pick FDW syntax from PostgreSQL 8.4.

	1. Catch-up/cherry-pick FDW support from PostgreSQL 9.3
	2. Add glue code to make it possible to use the FDW API, with the
	External Table syntax.
	3. Refactor existing external protocols to use the FDW API.
	4. Remove external scan and update code.


	Alternatively, instead of refitting all existing protocols to the FDW
	API, we could write a special adapter FDW that converts a stream of
	COPY-formatted data into tuples, and vice versa. The adapter FDW
	wouldn't connect to any particular data source, but it would call yet
	another plugin to get the data. The API for those plugins would be the
	same as the current external protocol API. That's appealing, but I think
	it's easier to just refactor all the existing external protocols. There
	aren't that many of them, and dealing with yet another adapter layer
	seems more complicated in the long run. There will be some duplicated
	code, to parse all the COPY options, like whether it's in CSV format,
	delimiters, etc. But it's not too bad. The file_fdw in PostgreSQL does
	the same, and file_fdw.c is about 1000 lines of C code. The set of COPY
	options has been unchanged for a very long time, so I'm not too worried
	about new options being added, that would then need to be copied to all
	the protocols.


	MPP
	---

	There is one crucial difference between PostgreSQL and Greenplum:
	Greenplum is an MPP system. That's reflected in the external protocols,
	but community FDWs know nothing about that. What exactly should happen,
	when you do "SELECT * FROM foreign_table"? Is the foreign scan started
	on every segment? Or is it performed fully in the master? Or both? If
	the foreign scan is started on all segments, how does the FDW on each
	segment know which slice of data to fetch?

	That's handled slightly differently in different external protocols,
	currently:

	* With gpfdist, the segments pass along the segment-ID in the HTTP
	requests they make to the
	gpfdist server. The gpfdist server slices the data, based on the
	segment-ID.

	* With S3, each segment fetches only a subset of the files in the
	bucket, based on its
	segment-ID.

	* gphdfs works similarly: each segment fetches a subset of the of the
	data, based on its
	segment-ID.

	* With EXECUTE-type external tables, the command is executed on every
	segment. A bunch of
	environment variables are set, including GP_SEGMENT_ID and
	GP_SEGMENT_COUNT, which you
	can use in your script to split the data, if needed.

	So, how will we handle this with Foreign Data Wrappers? There's probably
	no one-size-fits-all solution. For existing external prtocols, converted
	to FDWs, we should keep the behaviour the same. But for FDWs written for
	PostgreSQL, that know nothing about MPP, we shouldn't just blindly run
	the foreign scan on every segment. For example, with a Twitter FDW,
	written for PostgreSQL, running the foreign scan on every segment would
	fetch the same tweets on every segment, and you'd get duplicated rows.

	I think the only safe option here is to insist that the FDW is executed
	only on the master, unless the FDW somehow indicates that it knows about
	MPP. Let's extend the FDW API with a new MPP-aware flag or callback. If
	the FDW set the flag, then the foreign scan is executed on all segments.
	Otherwise it's executed only on master.

	We might also want an override option for that, so that the user can
	force an FDW to be executed on every segment anyway. That might be
	useful, e.g. with the file_fdw, if you have different files spread out
	to the segments.