Skip to content

Instantly share code, notes, and snippets.

@littledan
Last active March 3, 2021 04:17
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save littledan/c54efa928b7e6ce7e69190f73673e2a0 to your computer and use it in GitHub Desktop.
Save littledan/c54efa928b7e6ce7e69190f73673e2a0 to your computer and use it in GitHub Desktop.
JavaScript Module Bundles

Old draft below in case anyone is curious

JavaScript module bundles are a syntax for bundling multiple modules into a single JavaScript file.

A quick example:

// filename: app.jsb
module "./count.js" {
  let i = 0;

  export function count() {
    i++;
    return i;
  }
}

module "./uppercase.js" {
  export function uppercase(string) {
    return string.toUpperCase();
  }
}

This module bundle is referenced from an HTML file via a script tag:

<script type=bundle src="./app.jsb"></script>

The modules defined in the module bundle are logically located relative to the HTML file that included them.

Once an HTML file loads a module bundle through a script type=bundle tag, it becomes available to other modules, as well as inline script type=modules.

<!-- filename: https://example.com/app/index.html -->

<script type=bundle src="./app.jsb"></script>

<script type=module>
  import { count } from "./count.js";
  import { uppercase } from "./uppercase.js";

  console.log(count()); // 1
  console.log(uppercase("daniel")); // "DANIEL"
</script>

In this example, the modules in the module bundle are registered relative to the JS file's base URL:

  • https://example.com/app/count.js
  • https://example.com/app/uppercase.js

Syntax

ModuleBundle is a new top-level nonterminal defined by the JavaScript specification. It consists of a series of module declarations. Note that there is no shared lexical scope between bundled modules in the same module bundle; they are simply side by side, like modules fetched from different URLs.

ModuleBundle : BundledModule
               BundledModule ModuleBundle

BundledModule : module StringLiteral { Module }

This grammar does not syntactically conflict with JS module blocks, which also use the module contextual keyword, but without a name, and not in the same context. Any confused attempts to use one in the context where the other is appropriate would quickly run into a syntax error, as JS module blocks are permitted in normal JS modules and scripts, whereas bundled modules are only permitted at the top level of a module bundle.

Semantics

Each module in a JS module bundle is "top level":

  • Syntactically, they can only be declared as a top-level statement of a module, not within another construct.
  • They are registered in the Realm's module map (not restricted in visibility to some subset of modules).
  • Each bundled module contained in the module bundle has its own top-level lexical scope. There is no shared scope.

A JS bundled module inserts an entry in a host-defined map/cache with the module contents.

Like other modules, the host is required to return back the same value of that module when it is imported, or an error.

HTML integration

Bundled modules have URL-based specifiers

Bundled module names are absolute or relative URLs in the same origin as the HTML file that includes them.

// logically located relative to the HTML's base URL
module "./count.js" {
  let i = 0;

  export function count() {
    i++;
    return i;
  }
}

// logically located at the root of the HTML's origin
module "/print.js" {
  export function print(name) {
    console.log(name);
  }
}

From a specification perspective, the StringLiteral which follows the module keyword is required to be a URL, just like the module name in an import statement. If it is a relative URL, then it must begin with ./.

For example, module "foo" { } is a syntax error, for the same reason as import "foo" is. Instead, use a URL that begins with either a scheme, or ., or /, such as module "./foo.js" { } .

Relative specifiers and import.meta.url

Relative module specifiers within a bundled module are resolved as if the module were downloaded from the URL of their specifier. Similarly, import.meta.url is defined as if the module were downloaded from the given URL. For example, say the following module is at https://example.com/script.jsb, containing the following:

module "./bar/baz.js" {
  console.log(import.meta.url);
  import "./bing.js"
}

If this bundle is included with a <script type=bundle> tag, then the module defined in it can be imported with the specifier https://example.com/bar/baz.js . When it is imported, that full URL https://example.com/bar/baz.js would be logged, and the inner import is would fetch from https://example.com/bar/bing.js.

Same-origin and scope restrictions

On the web, the specifiers for bundled modules may only be URLs with the same origin as the surrounding module bundle, and the path of each bundled module is required to have the base URL of the JS bundle as a prefix. For example, it is an error to declare a bundled module module "../foo.js" { }, as this definition would "break out" of the bundle's scope.

This directory restriction is equivalent to allowing the module bundle to operate with a Service Worker scope of ./. As with service worker, the directory restriction could be loosened by providing a [Service-Worker-Allowed](https://w3c.github.io/ServiceWorker/#service-worker-allowed) header in the response that provides the module bundle (or perhaps a separate header could be defined).

Caching layer

There are multiple places where module bundles could be mapped/cached which would meet the requirements of the JS spec of returning the same module when it's requested at that specifier.

  • The module map: A JS module bundle could have the semantics of inserting an entry directly in the module map.
  • Another memory cache: A JS module bundle could instead have the semantics of inserting an entry "somewhere towards the top" of the network stack. This would make them visible to fetches, and this cache could be defined to be accessible across multiple Realms (e.g., in Workers) for more flexible/convenient usage. For example, the memory cache could be a good fit here, or maybe some layer above/below that.

The use in Workers and Realms that the memory cache would enable makes me (Dan) suspect that it would be the more useful place. It's also nice that this choice would lead to a clean correspondence with Web Bundles (see below), which would also work across all fetches, and not be specific to the module map.

WebKit has expressed a preference of keeping networking logic separate from the logic for JS module bundling. It's not clear whether "another memory cache" meets this goal. More investigation is needed.

Privacy-preserving mitigations

Brave has expressed concerns about the possibility that bundling could be used to let servers remap URLs more easily, which cuts against privacy techniques for blocking tracking, etc. This proposal includes a few mitigations which are intended to make URL-based blocking still viable:

  • User agents are specifically permitted to make "verifying fetches", and reject the module graph if the URL does not contain the same contents when fetched independently.
  • The gradual upgrade strategy (see below) is based on just doing reads to these underlying URLs, and the cache digest strategy is also based on stability of these underlying URLs.
  • JS module bundles are restricted to just same-origin JS, so they are analogous in scope to what is currently done with popular bundlers like webpack and rollup, not adding more power.

Full HTML attributes and transition pattern to adoption

This proposal adds a few HTML attributes to <script> tags, to

  • type=bundle is a new script type which corresponds to the ModuleBundle non-terminal and interpretation. Generally, where not otherwise stated, <script type=bundle> is analogous to <script type=module>, e.g., in expected MIME types, fetch parameters, etc.
  • nobundle is an analogue of nomodule , which causes the <script> tag to become inert in new browsers supporting JS module bundles. This allows for declarative fallbacks.
  • main="./specifier.mjs" is an attribute supported on <script type=bundle> tags, indicating that, to execute this script tag, import the following module specifier, which is expected to be contained in it.

These three attributes can be used together to provide client-driven negotiation about whether module bundles, a single ES module, or classic scripts are used.

<!-- main.html -->
<script nomodule src="no-modules.js"></script>
<script type=module nobundle src="single-esm.mjs"></script>
<script type=bundle src="module-bundle.jsb" main="./main.mjs"></script>
// module-bundle.jsb

module "./foo.mjs" { export const x = 1; }
module "./main.mjs" {
  import {x} from "./foo.js";
  assert(x === 1);
}
// single-esm.mjs

const x = 1;
assert(x === 1);
// no-modules.js

const x = 1;
assert(x === 1);

The above example is a bit silly, as developers don't get much benefit from using native modules vs optimized bundler output, but in other cases, there is improved specification compliance, etc.

Possible extension: Cache digests to reduce redundant fetches

This document by Yoav Weiss sketches ideas for a mechanism to send cache digests to the server, to avoid redundant fetches, in the context of Web Bundles. It seems like such an idea would work just as well in the context of JS module bundles (with the limitation that it would only work for JS modules, and the risk that it may not be efficient enough to put into practice in the case of many small modules), since they are both based on URLs identifying the modules, that can be placed in the same kind of hashed data structure.

Possible extension: Response header to include a JS module bundle

Rather than linking all JS module bundles directly from HTML, an HTTP response to a request for a JS module may instead be a module bundle, with the interpretation driven by a new HTTP header which indicates which is the "main" module in the bundle to be re-exported. See further discussion of "main" modules below.

Possible extension: Dynamic import of bundles

JS module bundles may be added dynamically by using the DOM to create a <script type=bundle> tag, and wait on its completion with onload. Then, modules contained inside of it can be loaded with import() .

To improve ergonomics and provide a cross-environment primitive, an import.bundle() syntax could be added, to be given the URL to a bundle, which would resolve the bundle as a relative URL (based on where this form syntactically appears) and return a Promise of undefined which resolves once the bundle is successfully loaded.

Relationship to Web Bundles

JS module bundles provide a strict subset of Web Bundle behavior, and can be thought of as a distinct syntax for the same semantics. This limited functionality is identified as important because of its current popularity of tools like webpack and rollup, which significantly improve loading performance in practice compared to loading JS modules as individual subresources.

Similar security considerations

JS module bundles specifiers are restricted to the same origin and scope as the modules that brought them in. There is no "non-authenticated" or signed cross-origin path. Web Bundles could be restricted to these use cases as well, and a document by the Web Bundles proponents is being written now to articulate that case. So, with respect to the origin model, the proposals can be seen as equivalent.

Like Web Bundles, JS module bundless may remap resources (see Brave's concerns about this). This proposal explicitly permits UAs to perform fetches to "verify" the contents of the "underlying" resource; such verifying fetches could be done with Web Bundles as well.

Similar serving, compression and caching

  • Like Web Bundles, HTTP compression can happen across the entire file of module bundles.
  • Like Web Bundles, JS module bundles are not cached individually in HTTP caches. Instead, they are instead cached all together as one big resource. Web Bundles are proposed to work in the same way, to avoid interprocess communication overhead (where the HTTP cache is in the network process, which is expensive to talk to on a per-module basis when there are thousands of JS modules).

It's unclear what level JS module bundles will do caching at. If they do caching in the module map level, then they would indeed differ from Web Bundles, as Web Bundles operate at the fetch level. Caching at the fetch level is likely preferred for JS module bundles anyway, so that they can be accessed from other Realms and Workers.

The idea of adding module bundles to JavaScript is completely compatible with adding Web Bundles or some other more advanced bundling strategy. It would be a little redundant to have multiple syntaxes, but they don't get in each other's way, and they would "layer" cleanly if based on the same caching type.

Web Bundles support HTTP headers, unlike JS module bundles

JS module bundles do not have HTTP headers or other networking logic, so there is no need to replicate this logic in the renderer process for good module load performance without interprocess communication. This means that JS module bundles don't have a syntax for supplying out-of-band metadata, as HTTP headers provide, making this format less extensible. Web Bundles also support content negotiation.

The expressiveness could be taken as a pro or con. Web Bundles would probably support some HTTP headers and not others, to make it safe and practical to process them in the renderer process. Syntax could be added to JS module bundles if there's some particular kind of metadata that we want to support.

Web Bundles support multiple resource types, not just JS

JS module bundles only support bundling JavaScript. Non-JavaScript module types, or other kinds of subresources (including binary formats), would be a poor syntactic fit for embedding literally in a JavaScript file. Certain types might be made to work (e.g., JSON, HTML or CSS), but it may make more sense to move to general Web Bundles when expanding outside JS, due to the following issue.

Today, there is wide deployment of bundling SVG, CSS, and even binary formats (as base64 data URLs!) into JavaScript as the "transport medium" in order to get the benefits of bundling, both in terms of being fewer network fetches as well as being a single file that's easier to pass around. There are techniques for bundling CSS and SVG individually (e.g., with CSS, by adding classes to selectors; with SVG, spriting), but these are complex, not entirely semantics-preserving, and add their own overhead.

Looking forward, it's important to bundle WebAssembly together with JavaScript, so this loss is particularly unfortunate, and may lead to the continuation of practices of bundling Wasm into JS with base64 strings or other inefficient mechanisms.

Web Bundles could form a clean, efficient solution to bundling many different resource types, but JS module bundles would leave this problem to be solved elsewhere.

Web Bundles are easier to parse than JS module bundles

A major advantage of Web Bundles over JS module bundles is that they are easy to parse: they have an index at the beginning which describes the offset and length of each resource contained inside. By contrast, JS module bundles require a JS parser (or reduced form of one) to parse out where the individual bundled modules start and end, as they do not have this index.

For background, we expect both Web Bundles and JS module bundles to be cached at a level such that it is possible to access them within the renderer process, without the interprocess communication overhead of going to the renderer process. These IPC costs have been found in V8 to lead the performance overhead of non-bundled modules.

The complex syntax of JS module bundles — which pull in the whole JavaScript parser — implies that it is important to avoid double-parsing. If the JS engine is responsible for parsing them, then it can produce bytecode on the first time, reducing overhead. However, this may be undesirable from a layering perspective.

There are two possible data flows to explain how JS module bundles interact with the memory cache/module map:

  1. (Preferred by the author of this document, which is encoded in the spec outline) the JS module bundle is parsed by the JS engine, and the enclosed modules and specifiers are returned to the "host" (renderer) as a list of key/value pairs. Then, that host is responsible for writing these into the appropriate cache level (either the module map or the in-memory cache).
  2. (Another possible layering) there is a sort of pre-parse done before reaching the JS engine, to identify the bundled modules and insert them into the cache, before handing the module off to the JS engine to parse (which would do a second pass). This avoids the "cost" of needing to bounce through the JS engine to identify the contained modules, but means multiple passes over the code. Web Bundles inherently avoid this cost by having an offset/length index at the beginning, so they are easy to process separately, unlike the grammar of this proposal.

This leads to the question: does the possibility of wanting layering #2 mean that Web Bundles are fundamentally better than JS module bundles, as they have a binary format which is friendly to layering #2, unlike JS module bundles? It is difficult for this author to see the motivation of layering #2, however.

Relationship to JS module blocks

JS module blocks are a separate concept for a module syntax which acts like a specifier: it can be imported with import() or new Worker(), but not with a static import statement, as it is not in the module map. Instead, it is a key in the module map. JS module blocks may be imported in other Realms. The similarity is only at a high level, where both are declared with the module keyword and have modules inline in JS files.

Specification outline

Module bundles would be specified in a combination of changes to ECMA-262 and HTML.

In ECMA-262: This specification would define a new ParseModuleBundle algorithm, which would return key-value pairing of specifiers and module bodies as strings.

In HTML: <script type=bundle> would call out to a new path which calls ParseModuleBundle. Each bundled module would be passed through a new algorithm, which would:

  1. Normalize the module specifier relative to the base URL of the module it is contained in.
  2. If the module specifier is outside of the module it is contained in, then trigger an error and fail.
  3. If the module specifier is in the module map already, then trigger an error and fail.
  4. Optionally, the UA may perform a fetch to the server at the module specifier's URL, to check if the contents are the same, and fail if they differ.
  5. Parse bundled module is parsed as if it had a JavaScript MIME type and insert it into the memory cache or module map (depending on the semantics we decide on).
@nayeemrmn
Copy link

@littledan Neither https://github.com/surma/proposal-js-module-blocks nor this attempt to allow the following in a regular type=module script.

module "./foo.js" {
  export const x = 1;
}

module "./bar.js" {
  import { x } from "./foo.js";
  export const y = x + 1;
}

import { y } from "./bar.js";
assert(y == 2);

This is what I was hoping to see following a link mentioning named inline modules, but both proposals barely skirt by it.

Maybe type=bundle could instead be a special case of this where 1) top-level code is disallowed (maybe) and 2) the defined modules are exposed to the rest of the realm but by default are only visible in that script. That leaves a simple way of concatenating ES modules.

@littledan
Copy link
Author

Yes, this disallows top-level code, to reduce complexity. I thought a lot about including top-level code, but it seems like it has a lot of disadvantages. You can replace the top-level code with a script tag to load the bundle as described in this file, plus a script tag to load the individual module in the bundle. I don't think this makes burden too high to use, since you already need a different kind of script tag anyway, but maybe I'm missing something.

(Note that module blocks does not relate to bundling.)

@nayeemrmn
Copy link

nayeemrmn commented Nov 11, 2020

Okay it's clear that the problem in my head is more suited to https://github.com/surma/proposal-js-module-blocks. But my point is that if that proposal and this one both came to fruition, is there any room left for inlined, named and statically importable modules that can be used in regular ES modules? So that these:

// http://example.com/foo.js
export const fooUrl = import.meta.url;
// http://example.com/main.js
import { fooUrl } from "./foo.js";

console.log(fooUrl); // "http://example.com/foo.js"

can be transformed to something like this:

// http://example.com/main.js

module "./foo.js" {
  export const fooUrl = import.meta.url;
}

import { fooUrl } from "./foo.js";

console.log(fooUrl); // "http://example.com/foo.js"

If it was just one of the proposals I would be hopeful that this is still possible in the future, but both of them together leave a void in the middle that's very awkward to fill.

@littledan
Copy link
Author

I think this is possible in the future, and more along the lines of this proposal than the other one. I wonder why you want this, though.

@nayeemrmn
Copy link

nayeemrmn commented Nov 11, 2020

I wonder why you want this, though.

I'm thinking of deno bundle. More generally, a way to bundle some entry-point module with all of its dependencies without the host needing a special mode (type=bundle / deno run --bundle) to run the result.

I don't think this makes burden too high to use, since you already need a different kind of script tag anyway

This isn't host-agnostically true -- now that I've mentioned Deno. ES modules are supposed to be the modern go-to. I don't think adding yet another script type (or requiring it for this feature) is justified by this point.

@fabiosantoscode
Copy link

Regarding top-level code, maybe this construct could be used?

module default "./main.js" {
  // source of main.js
}

The default keyword would inform that we're about to see the code for the main file. It could be possible to override from the script tag, but I think it's important to have an in-band way of specifying a module to run first.

@ljharb
Copy link

ljharb commented Nov 24, 2020

What is a “main file”? Naming a file “main.js” isn’t a convention I’m familiar with.

@fabiosantoscode
Copy link

By main file I meant entry point. The module that the browser is supposed to run first and may or may not import other modules.

However I'm more clear on the purpose of this proposal now so I don't believe that having an entry point is a good fit.

@ExE-Boss
Copy link

The line

… by providing a `[Service-Worker-Allowed](https://w3c.github.io/ServiceWorker/#service-worker-allowed)` header …

Needs to be:

… by providing a <code>[Service-Worker-Allowed](https://w3c.github.io/ServiceWorker/#service-worker-allowed)</code> header …

or:

… by providing a [`Service-Worker-Allowed`](https://w3c.github.io/ServiceWorker/#service-worker-allowed) header …

This is because in Markdown, `s disable inner Markdown parsing, it’s important to use syntax that doesn’t do that (e.g.: the HTML inline <code> tag, or nesting the `s inside the <a> tag)

@nayeemrmn
Copy link

Super happy with the direction the proposal took. Thanks, @littledan!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment