Skip to content

Instantly share code, notes, and snippets.

@mikermcneil
Last active January 20, 2016 21:41
Show Gist options
  • Save mikermcneil/0cc1dd143835651a2327 to your computer and use it in GitHub Desktop.
Save mikermcneil/0cc1dd143835651a2327 to your computer and use it in GitHub Desktop.
[work-in-progress] An overview of how body parsers and file uploads work in general, and why Skipper is designed the way it is.

Body Parsers Through the Ages

An overview of how body parsers and file uploads work in general, and why Skipper is designed the way it is.

This is a work-in-progress based on discussion here https://github.com/balderdashy/skipper/issues/54#issuecomment-72048398

How File Uploads Work

This is my first attempt at explaining the efficient handling of TB multipart file uploads using ASCII infographics, so please be patient :p

The legacy Express/Connect body parser

                                                                          (  use req.files to get the path       )
                                                                         (  to each already-uploaded file on disk )
                                                                         (  on disk, then copy it to a new folder )
                                                                          (   or stream it out to S3, etc.       )
                                                                               |
                                                                               |
                      _________________           __________________           •   _______________________
   file    =~=~=~=~>  |   legacy      | ==/ /==>  |  tmp folder    | ====/ /====>  | your routes/actions |
  uploads             |  body parser  |           | on server disk |               |_____________________|
                      |_______________|           |________________|               |_|_|_|_|_|_|_|_|_|_|_|

No body parser

     ( manually parse req stream  )
      ( in your app-level code   )
     (   on a per-route basis   )
                  |
                  |
                  •   _______________________
   file    =~=~=~=~>  | your routes/actions |
  uploads             |_____________________|
                      |_|_|_|_|_|_|_|_|_|_|_|

Skipper

                                  ( use req.file('foo') to access upstream )
                                 (  then call .upload() to pipe incoming    )
                                 ( file(s) directly to an upstream receiver )
                                  (  like Amazon S3, Mongo gridfs, or disk )
                                           |
                                           |
                      _________________    •     _______________________
   file    =~=~=~=~>  |   Skipper     | =~=~=~>  | your routes/actions |
  uploads             |  body parser  |          |_____________________|
                      |_______________|          |_|_|_|_|_|_|_|_|_|_|_|

Legend

Symbol Meaning
=~=~=~=~> incoming file stream, arriving a few kB at a time (don't wait for entire file)
===/ /==> entire file duplicated on disk (wait for entire file before proceeding)

Is there any way I can know the names of the the temporary files from req (in an api service) before I upload ?

Instead of using the .upload() convenience method, you can tap in directly to the upstream of incoming files:

function (req, res) {
  
  // Grab the upstream from skipper
  var upstream = req.file('screenshots');
  // ...
  // build an intermediate object-mode transform stream
  // to reject or accept files
  // e.g.
  // var myIntermediateStream = (require('stream').Transform)({objectMode: true});
  // myIntermediateStream._read = function (requestedNumBytes){...};
  // see http://nodejs.org/api/stream.html#stream_class_stream_transform
  // ...
  
  // Build the receiver
  var SkipperDiskAdapter = require('skipper-disk');
  var receiver = SkipperDiskAdapter().receive({/* opts */});
  
  // Finally, pipe the upstream to your transform stream- then pipe that to the receiver.
  upstream.pipe(myIntermediateStream).pipe(receiver);
}

In the next release of Skipper, I'm interested in exposing a built-in configurable function you can use to accept or reject any given incoming file from an upstream before it is passed to the receiver (this is effectively how the maxBytes option works) If you're reading this, please let me know if you also have this use case and have time to contribute. We need your help.

If there's no way: how do I access req.files. Doesn't seem to work?

So the req.files you're used to was a property of the traditional Express/Connect body parser from Express <= v3.0.0. The way it works is to upload incoming files to a temporary directory on disk -- before running your app-level code. That means if a user agent is uploading three 100MB files, they must be entirely streamed to the server's hard disk before your code can run. While it was possible to limit this somewhat in previous versions of Express, it was still a challenge- and more importantly, an easy-to-miss DoS vulnerability. So Express 4 removed file upload support from core. Meanwhile, @sgress454 and I were working on Skipper because I didn't want to just leave people using Sails (including myself) hanging.

You can use any bodyparser you like w/ Sails or Express (there are lots of good ones!) In Sails, for example, you can override the default bodyparser middleware (Skipper) by configuring sails.config.http.middleware. You shouldn't need to do that in 90% of cases, but in case you run into something insurmountable, rest assured you always have the option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment