So, it looks like COPY support will have to be divided into two parts, /copyfrom
and /copyto
, which is fine.
They are both interesting, in that since the idea is to support scaling, the implementations absolutely must stream, rather than taking files or holding data in memory.
The parts needed are
- The node-postgres add-on for handling streaming COPY, https://github.com/brianc/node-pg-copy-streams
- The node middleware for reading multi-part form data https://github.com/expressjs/multer, so that a POST file upload can be read
- An implementation of a multer custom storage engine to bind the upload stream to the copy stream https://github.com/expressjs/multer/blob/master/StorageEngine.md
The parts do in fact all fit together and work, as can been seen in this small proof-of-concept:
In the proof-of-concept everything of interest happens in the custom storage engine. Database connect, stream handling, etc. In the real implementation in the SQL API, we need at least a pg client, and the COPY sql we want to run. Getting these into the custom storage handler seems quite hard.
A basic implementation within the SQL API is on this branch:
The /copyfrom
end point is set up by the copy_controller.js implementation.
The copy controller adds in two extra parts to the standard middleware chain used by the query_controller.js. A standard npm body parser, and the multer in single-file mode, with a custom storage handler, utils/multer-pg-copy.js.
The custom storage handler doesn't yet try to do a COPY, as that requires getting our hands on a pg client, which is tricky since the handler is only fed the "req" object, while the "authParams" needed to connect to the database live on "res.locals". Putting a stupid middleware ahead of the multer that copies the "authParams" onto the request object works to pass the database connection information through, although it seems hacky.
The blocking problem is this: the 'sql' parameter, that I want to access in the custom storage handler, because it will be required once we start doing a real COPY into the database is not available on the "req.body.sql" object at that stage. It is available after the handler is finished, in later stages, but not while the custom multer storage engine is running. By the time the terminal "handleCopy" function in "copy_controller.js" is called, the value is there, but it is not there for the multer storage engine, even though the body parser precedes the multer in the middleware chain.
Note that the proof-of-concept does not display this problem, even though the set-up seems identical: end point bound to POST, with a middleware chain that first runs body parser and then runs the multer.