chrisdickinson/modules.md Secret

## modules.md

      
    Raw
  

              modules.md
            
          
    JavaScript Modules

There's been a lot of arguing on Twitter about the proposed module syntax for ES6,
especially between members of the Node.js community and members of TC39. The decision
to include modules in the ES6 spec would have far-reaching implications for all JavaScript
developers, hence the impassioned debate.
ES6 does not need syntax-level module definition, imports, and exports. The cost
of adding this syntax and the risk of getting it wrong in subtle ways is far bigger than
the potential gain we get over the status quo. The community should have the chance
to use the other new features in ES6 to make the existing userland module tools better
before we specify language-level support.

An aside: I don't intend to disparage any of the great work being done by TC39.
I do not think the modules proposal should be made official. That is not to say
that I think it is bad and needs to be replaced by something else -- that is to say
I do not think ES6 needs any sort of module loading or exporting syntax. I suspect
that it's true of JavaScript at large, too; but I'd like to have that borne out
by seeing how people use the other new tools ES6 gives us.
What has been proposed is cool, but I do not think it is right for JavaScript
at this point in time.

To see what gains we might make versus what costs we might incur, we need to take
a look at the current state of code sharing in JavaScript.
We'll start with JavaScript in the browser, as it's the largest intended audience for
any proposed new syntax. In the JavaScript we have today (ES5):

There's zero language-level concept of modules or dependencies. Everything is in userland.
Code sharing is limited to stapling APIs to well known
properties on the global object, or by using a userland module loader.
In most cases, best practices say you should "concatenate and minify" your production code. And, to
add to that, you should probably GZIP it too.
Further best practices: don't host your static media from your webserver -- use a media domain (or
two!) on a CDN (though S3 might do in a pinch). Remember to have your CDN serve up the appropriate
Content-Encoding header if you GZIP'd your massive monofile like you should've!
But wait, if you modify any of your code, you have to redownload all of the JS again! You should look
to separate the monofile into 3-4 large files that change at varying rates. Make sure the filename includes
the SHA1 sum of the file contents, so that you don't have to deal with bad caches!

There's a large spectrum from "it works and it's fast to develop" to "it works and it's fast for users" here, and
it roughly maps to "how hard is it for the developer to understand / implement":
good ^          /
     |        /
     |      /
(ux) |    /
     |  / 
bad  |/
     O------------>
  easy            hard
      implementation
        difficulty

One may (potentially) download and run JavaScript using any of the following techniques:

as code inlined into HTML using script tags
using a script tag that refers to URL from which to fetch the code
by programmatically adding script tags to the DOM
by using XHRs (and XDRs) to fetch the source code from a URL and run using eval or Function.
by opening an iframe to a url with code that provides a postMessage api

Any one of these scenarios can at any moment invoke any other scenario. Depending on the
location of other resources and the CORS headers they're served up with, JavaScript can either read and
run source code from urls, or only run the source code. That gets even foggier when you take into account the
Content-Security-Policy headers the original HTML document might have been served up with -- which can
disable any one of these execution scenarios wholesale, or even disable core JS functionality (eval and new Function(src),
for example!). Further muddying matters, there's no one-to-one mapping between a "module" and a file --
one file may include multiple modules (bundled, or as an HTML document with multiple script tags). There's not
even a cohesive directory structure unifying all of the code that might be run on page: some code might be
on domain A, some on domain B, some inside of the HTML document itself and thus not represented by a JS file, etc.
And yet, userland module loaders exist. They exist because they can provide certain guarantees: we can map one
module to one file; we can unify the directory structure, even across multiple domains; we can give a module
some notion of its position relative to other modules. They all basically work this way, whether they precompile
or not: "If you play in our sandbox, we'll make sure you've got what you need. If you step out of our sandbox, you're
on your own."
Node.JS

Node.JS has flourished, in no small part due to its module loading strategy and how well it ties into the package
manager. Importantly, most of the module loader is just javascript. Even though it's baked into Node proper, it's
very nearly as userland as any other module loader. And again, it provides guarantees only if you play nice -- if you
load and run JavaScript yourself (i.e., grab files with fs.readFileSync and run them with vm.runInNewContext), you're
stepping outside of the sandbox and it will not help you -- you're on your own.
Node's module loading is contentious for one primary reason: synchronous-style require statements. The assumption is that
since startup is basically a sunk cost (and hitting the disk isn't really all that bad compared to hopping across the internet),
it doesn't make sense to optimize that process for end users -- it opts to make the developer experience more comfortable.
Sync-style requires are great to write, and the single-export -- while clunky -- leads to well formed, small modules.
It feels remarkably well thought out, and is a joy to use as a JavaScript developer.
so why can't we just do that in browser?

We can and can't. There are problems:


As previously mentioned, from one piece of JavaScript, there's no reliable guarantee that you'll be able to read
JavaScript served from a different domain. And it just so happens that serving JavaScript from a different domain
from your web server is considered a best practice.


If that weren't a problem, you could theoretically use synchronous XHRs to fetch new code. But you'd have to block
execution for every module load, and that adds an unacceptable amount of delay before page load (not to mention, potentially
triggers "Stop execution of long running script" dialogs). In other words: this leads to a horrible, horrible user experience.


If you can programmatically get at the source of the code you're trying to load, you could potentially parse it for require statements,
you could fetch all of the imported modules simultaneously (and fetch their requirements as we load them), and only run the script once
the entire tree of dependencies has loaded. This is still slow, and easily spills into "horrible UX" territory, but allows you
to avoid precompiling your code -- making life easier for developers.
It turns out that you can get at the source code programmatically, if you use a well-known api to accept a function; you
can coerce that function to a string to get at the source code (regardless of the domain it was served from) and can look for
import statements from there (Require.JS supports this). However, this means that modules must be entirely contained in that
function, and that modules must rely on that well-known api to exist (i.e., boilerplate and lockin, bad for devs; good for no-one).
Further, you can't rely on the cached or uncached nature of a single module to be indicative of the cached status of its dependencies.
And to determine the caching status of a module, you potentially have to hop over the internet (slow!) to the host and see what its
cache headers say (assuming they've been setup correctly!)
Finally, while it's tempting to use this style of loading to rely on a central package repository (say, github or npm, once it gets CORS headers),
that's a bit too centralized for the web. One site being down affecting hundreds of thousands of others is not in the spirit of the web! Not to
mention the numerous security implications of running code on clients distributed from a central location -- at the very least, it'd be a huge
target.
So, centralized package repositories are right out, unless you're willing to set up and maintain your own public mirror.


If all else fails, you can build the bundle offline as a compilation step. Browserify does this. This is actually nearly ideal for users, since it
means you're already concatenating all of the many modules your using into one big one that can be fetched with a single HTTP
request -- and since you'll probably be minifying it and gzipping it now that you have a asset pipeline in place.
Plus, since it's all offline, you can lean on centralized package repositories -- no more reinventing the wheel! -- since you can
verify package checksums offline and rely on the fact that your module loader won't need to rely on the uptime of a centralized service.
This works great for single page web applications, since there's likely to only be one entry point and thus one possible resulting bundle.
But the web isn't made up entirely of single page web apps, and you've got to consider that some of your pages will use one JS entry point, and others another --
and you'll produce a wholly different large bundle file for each page that has a different entry point. Oh, and every time you
change any file, any of those large files will have to be regenerated (and redownloaded by the client for each page with a different
entry point that they hit). Blergh.


If you've got all of that together, what happens when you finally generate a bundle that's over 2mb in size? You start to want
some programmatic way to include modules at runtime (and ideally, a way to report progress on those files!) without statically including
them in the bundle. And now you're back at the step before last.


So, in essence, there are a lot of userland solutions that get most of the way to a completely workable solution, but somehow
miss the mark in the process.
It bears reiterating: the problems outlined above are problems endemic to JavaScript-as-part-of-the-DOM, not JavaScript-the-language.
enter es6 modules

I'll preface by saying that I'm not well versed in the current state of the proposal and am operating off of what I understand
from reading examples and the wiki. ES6 modules add new syntax (import, export, and module), as well as a new API (Loader) that
controls the syntax's behavior.
Adding new syntax implies that it will break in browsers that don't support the new syntax: this
means it's limited to browsers that do -- usually the cutting edge ones, and often behind a hidden feature flipper. So, the audience for
any new feature that requires new syntax is drastically diminished; and it can only be polyfilled by precompilation or if the JavaScript
runtime can see (and modify) source code before running it.
ES6 adds new syntax -- and will always be defined by adding new syntax. This limits the audience immediately,
and that the reduced audience will play the part of the "canary in the mine shaft" for validating new syntax;
and that audience will pay the price for any mis-features introduced by way of new syntax. TC39 must respect the
risk new syntax is imposing on this early audience and not move forward recklessly.
However, with modules, I feel the current course -- or at least the current pace -- is reckless:


Only authors targetting specific ES6-supporting environments can use the new syntax (disregarding
precompiling for the moment).
In practice, this means that only developers who specifically target a single environment
can use the new features: limiting the scope to Chrome/Firefox extension developers, and
the Node.JS commuity, for as long as it takes IE to release a version that supports ES6 + the
time it takes that version to become the minimum requirement (a timescale that has usually been
measured in years).
From what I've seen from reading TC39 member's interactions with the Node.JS community,
they seem to regard Node.JS as a vocal minority; when in reality for the first few years Node
developers will be very nearly the only users of any proposed module syntax.
Node.JS already has a perfectly workable module loading strategy,
and 20k+ modules published against it. Adding in "another way to load modules" at a syntax
level isn't particularly welcomed by the community -- it'll just fracture modules and code
style further with zero benefit over what they already have.


As far as I understand the proposal, it doesn't address any of the above issues with regards to
in-browser loading -- CORS, CSP, etc. rules still apply; only now we can't shim around it with
things like define(function(require){ }) anymore because the new syntax will blow up old browsers.
The spectrum of "good for developers / good for users" remains largely unchanged by the introduction
of these modules, in other words.


While it potentially unifies the way modules are imported and exported in the browser, in reality authors
will still have to use the Loader API to describe how things work on their site. It's great that Loader
provides this, but why not go API-only and make Loader affect the behavior of a per-code-unit require function?
Does the presence of the Loader API really need to be tied to new syntax to be useful, especially given that
Loader is still bound to the cache-querying, CORS-respecting, CPS-respecting rules as all of the rest of JS
in the browser?


ES6 does give us yield for shallow coroutines -- is it unreasonable that require simply be
implemented using yield internally in the browser (since we're getting really no performance benefits
via the vanilla loader)?


Folks embedding JavaScript in other contexts now have to take JavaScript-triggered imports into account,
even if the context in which JS is being embedded does not support the concept of module loading.


In Summary

The problem with module loading in JavaScript is not that JavaScript doesn't provide adequate language-level tooling
for modules, but that loading and running JavaScript from across the web in browsers is actually a fairly tricky problem,
with a range of solutions that please different audiences.
Adding APIs and non-module-loading specific syntax (like yield) that enable writing better userland
module loaders is a bigger win than specifying the syntax of module loading itself. I urge TC39 to consider backing off
on syntax-level additions for module loading, and instead take the following course:

Allow the JavaScript community time to work with the other new features in ES6 to explore the module loading problem in userland. If
a leading solution emerges, then codify it as an API.
If it proves necessary, reintroduce the Loader API, and have it control the behavior of a separate, overridable top level function.
require might be a good candidate name. Allow the community time to explore and attempt to build tools around that. Again, if a leading
solution emerges, codify it.
If neither of the above paths produce the desirable results, return to the topic of adding module loading syntax in a
later version.