Skip to content

Instantly share code, notes, and snippets.

@bmeck
Last active February 6, 2018 17:20
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bmeck/5b43052cb2e28bdae29c161f16f12801 to your computer and use it in GitHub Desktop.
Save bmeck/5b43052cb2e28bdae29c161f16f12801 to your computer and use it in GitHub Desktop.
Removal of path searching / defining a hook for migration.

Problem

There has been no progress in working towards a single cohesive story for path resolution between Servers and Web. Notable discussion points relevant to this are:

  1. Node has a path searching algorithm.
  2. Web has not been able to gather support for any of the following:
    1. Build tooling as part of UX expectations (lack of interest)
    2. Smarter static web servers (lack of interest). PoC example at https://github.com/bmeck/esm-http-server
    3. A resolve based hook. (interest shown with desire for ~6 months of userland experimentation)
      1. This will be assumed to exist under the existence of whatwg/html#2640 , URL.createObjectURL, and Service Workers.
      2. Rudimentary PoC (without actual integration) at https://github.com/bmeck/browser-hooking without a ServiceWorker. SW example at https://github.com/bmeck/node-sw-compat-loader-test.
  3. Node now has hooks for import
    1. Node has existing requirements for per-package hooks.
    2. Per-package hooks can be used to give a migration process towards people using features that the web is deficient in.

This proposal would seek to remove searching for index files and file extensions.

This proposal would seek to remove searching for package.json#main when importing resolves to a directory.

This proposal seeks to define a loader hook that can be encourages to be used per-package that adds the behaviors it wishes to remove.

This proposal does not seek to remove .mjs from being the canonical authorship of ESM.

Per-package loader hooks

The underpinning assumption of this proposal is a strong support for per-package loader hooks. This is a definition of the capabilities and a bikeshed for how to achieve them.

Scope of hooks

Hooks must be confined to a well defined subsection of the URL space (fs) used by import.

This proposal will define the boundaries of subsections to be:

  • A directory containing package.json will have a termination when crossing the directory.

Given the fs of:

/path-searching-hook
/foo
  /package.json
  /bar/example.mjs
/a
  /package.json
  /a.mjs
// /foo/bar/example.mjs
import '../' // does not cross boundary by resolving to `/foo`
import '../..' // does cross boundary by resolving outside of `/foo` to `/`
// /a/a.mjs
import '../foo' // does cross boundary by resolving out of `/a`
import '../foo' // does cross boundary by resolving to `/foo`

Consumer and Author negotiation

  • It must be possible as a consumer to affect the path resolved within another package's scope.
  • It must be possible as a author to affect the path resolved within the author's package scope.

In order to avoid recursive boundary crossing in one step, all paths will be resolved in two phases.

  1. External resolution that is resolved by consumers from a different package scope.
  2. Self resolution that is resolved by the package scope containing the resolved path.
// /a/a.mjs
import('/foo');

// 1. fires /a 's package scope loader hooks, seeing `/a/a.mjs` as source and `/foo` as specifier
// lets assume it resolves to /foo
// 2. fires /foos 's package scope loader hooks, seeing `/foo` as source and `./` as specifier

Declaration of hooks

Per package loader hooks can be declared in a package.json file as a specifier to find using the globally defined resolution algorithm. Global hooks may affect this resolution, but package hooks may not. This allows code coverage, instrumentation, etc. to access package hooks.

{
  "name": "foo",
  "hooks": "../path-searching-hooks"
}

This also allows the hooks to exist outside of package boundaries. This file when loaded as a loader will be in a separate Module Map space from userland and only has the globally defined resolution algorithm.

Types of hooks

  • only a resolve hook. use URL.createObjectURL or alternatives like Service Workers if you need to modify source.

On the nature of static resolution

ESM is able to link statically and there should be a path to allow static / ahead of time usage of per package hooks.

By only having a single resolve hook, paths can be rewritten and observed to do in-source replacement.

This is problematic however, since URL.createObjectURL lives in memory. Usage of such APIs on platforms without writable fs like Heroku should have a path forward for these hooks.

I recommend a combination of V8's SnapshotCreator when possible, and a flag to allow rewriting URL.createObjectURL reservations to a location on disk.

Problem, multiple boundary crossing

/root
  /package.json
  /entry
    /package.json
  /dep
    /package.json

If entry were to import('../dep'). It would be handled in the typical entry hooks then dep hooks manner. This does not give root a chance to intercept the imports.

This is seen as a suitable limitation since root is presumed to have ownership of entry and dep's source code by them existing within its directory. Edit the entry and dep packages as needed in order to achieve hooking that goes through root's use cases.

Composition

Hooks should have a means by which to achieve composition. This is needed for cases of multiple transformations. A package might seek to call a super of sorts to get the result of a parent loader, and it may seek to do the exact opposite as a guard to ensure expected behavior.

Loaders therefore need to have a concept of a parent loader hooks to defer to, or to ignore.

Changing hook allocation to be done using new and providing the parent as a paremeter is sufficient for this:

#! node --loader
module.exports = class LogImports {
  constructor(parent) {
    this.parent = parent;
  }
  async resolve(url) {
    debugger;
    const ret = await this.parent.resolve(url);
    console.log(url, 'became', ret);
    return ret;
  }
}

Example use cases for composition

  • Code Coverage
  • Instrumentation such as APM
  • Mocks/Spies in testing frameworks
  • Logging/Debugging
  • Compilation
  • Linting
  • Isolation (such as with code signing)

Isolation

Hooks that are composed still are isolated by per-package boundaries. Nested packages will not fire the parent loader hooks unless they cross into a package boundary with those hooks.

Passing arbitrary data between instances can be problematic for both isolation and threading. Therefore the only data passed between instances of loaders will be transferables or primitives.

The parent passed to the constructor of a loader will be a limited facade that only shows white listed proprties and calls the relevant method on the true parent instance. It will ensure errors are thrown if given improper arguments length and/or non-transferable data.

Per-package composition

Can be achieved by manually constructing the chain inside their per-package hook code.

Global composition

Can be achieved by providing multiple --loader flags. This allows for better debugging when development loaders need to be added.

npm start
# => node hasErrors.mjs
# aborts
export NODE_OPTIONS='--loader DebugImports'
npm start
# will log imports if HasErrors defers to the parent loader

Ignoring parents

In certain scenarios a package may need to ignore the parent loader. In those situations the hooks will be unable to defer to the default global behavior of the process, which may provide debugging behavior such as logging/code coverage/linting/etc.

For now escape hatches are punted on this design space to userland, but it is recommended that when using NODE_ENV=development or NODE_ENV=test all loaders defer to the parent loader.

Code signing invariant implications

Mutating the code loaded in a code signed bundle is problematic. Integrity checks of unexpectedly mutated imports should fail. This area needs more research.

Future research

Given the problems of ignoring scripts and code signing being unable to easily defer to parent loaders more design needs to be done around development workflows. Inspector tooling is the recommended approach. This may mean adding special hooks to inject loader hooks during development via a flag such as --inspector-loader-hooks=LogImport that may fire before per package hooks but ensures the inspector is running.

@medikoo
Copy link

medikoo commented Jan 31, 2018

It must be possible as a author to affect the path resolved within the author's package scope

What's the use case for that?

Wouldn't it be better to provide simply a way, so author of app (I take that maps to consumer), can provide it's own resolver as e.g.:

async function resolveUrl(sourcePath, dependencySpecifier) {
  ...
  return url;
}

That's good enough to solve use case of node.js resolution. Maybe then in next step node.js resolution can be standardized and provided out of a box by browser as default if instructed by the app.

@bmeck
Copy link
Author

bmeck commented Jan 31, 2018

@medikoo , there are a variety of reasons, from being able to reroute your imports using environment settings like NODE_ENV to being able to mask your paths so that private modules cannot be imported from outside your package (react commonly has this be a problem for upgrades when people use internals). It also lets you be lazy about things like compilation so that when someone requests a path inside of your package you can fire up the compiler (Typescript, JSX, etc.) only when needed rather than ahead of the time.

@medikoo
Copy link

medikoo commented Jan 31, 2018

@bmeck maybe we should list all those use cases, and sort them by priority.

As some to me sound as necessity and some just as nice to have. Trying to address everything may clutter the work, make it too complex, and in result we may end up with nothing.

Wouldn't it better to provide some simple solutions for core needs (as door to node-like resolution), and on the road discuss further steps?

@bmeck
Copy link
Author

bmeck commented Jan 31, 2018

@medikoo, they are designed in concert with a lot of things not purely node itself since the future has other use cases that we need to be sure anything core needs doesn't interfere with. We can discuss what is strictly needed vs nice to have, but that gets subjective quite quickly. I'll try and describe the use cases we need in less concrete detail below:

  1. Ability to implement custom path resolution

This requires the dependent module calling import() or similar to be able to rewrite the path to a dependency. This may affect paths within their own package and may also affect finding paths within other packages. A package may wish to ensure that they work reliably so that import('pkg/foo') always gets routed using a specific algorithm such as the node_modules algorithm, even if the importer does not have that loader setup. It can use hooks that intercept when a specifier resolves inside itself to have ./foo search for ./foo.mjs, ./foo.js, etc. without assuming the consumer will provide that behavior for them.

  1. Ability to support multiple environments

Similar in nature to the needs of browser fields in package.json , a package should be able to ship multiple modes of support for deployment environment, testing, internationalization, etc. Unlike the "main" field scaling, a single solution that supports deep linking should be supported so that things like import('lodash/chunk') work.

This requires that a package be able to intercept any incoming requests and route them to the appropriate distribution of a module. These redirects may use NODE_ENV, feature detection, or similar to appropriately route to the appropriate distribution. This does not mandate that the package reroute to inside of itself; for example, import('translations') might feature detect the language and reroute to import('english').

  1. Ability to support tooling as both lazy and ahead of time

It is important to support ahead of time tooling so that bundlers can accurately generate bundles. It is important to support lazy tooling for development workflows that are not using bundles. The design above is made so that a bundler can accurately confirm what should happen when importing occurs across both directions by having it be a static and well known lookup point for the loader rather than being determined by a mutable global at runtime or by method of consumption.

@medikoo
Copy link

medikoo commented Feb 2, 2018

A package may wish to ensure that they work reliably so that import('pkg/foo') always gets routed using a specific algorithm such as the node_modules algorithm

Why can't it stay as silent agreement/contract, as it's currently in node ecosystem? I wouldn't treat this functionality as must have as no modules system in JS (and maybe in the world) have had such thing implemented. I'm fairly certain community can go without it.

Similar in nature to the needs of browser fields in package.json , a package should be able to ship multiple modes of support for deployment environment

That use case is indeed very valid. Still I think it can be solved by standardization of some approach, and expecting that customer relies on that standard (same as with node modules resolution algorithm).

Opening the door so every package may have different idea about it's internal routing, will make things way less transparent and maintainable (another attribution to JavaScript fatigue).
Developers like to inspect how their dependencies work, and having that, developer seeing import path will never be certain what file it really imports. I wouldn't like to live in such world, would you? :)

Ability to support tooling as both lazy and ahead of time

I have problems in understanding why that requires packages to be able to reroute it's own paths. I don't think it's relevant to tooling, Actually having such possibility on a table, will make tooling even more complex (for each package set of rules needs to be established)

@bmeck
Copy link
Author

bmeck commented Feb 2, 2018

Why can't it stay as silent agreement/contract, as it's currently in node ecosystem? I wouldn't treat this functionality as must have as no modules system in JS (and maybe in the world) have had such thing implemented. I'm fairly certain community can go without it.

There is a divergence between the browser and node already. Some package such as those created by polymer are actually hard coding paths due to this divergence and it is the reason that NPM has sought to make npm asset to diverge the resolution algorithms between node_modules/ and assets/. Read up on this in NPM's repo

That use case is indeed very valid. Still I think it can be solved by standardization of some approach, and expecting that customer relies on that standard (same as with node modules resolution algorithm).

If a specific standard arises that is not limited in the same way as browser main fields, we could certainly standardize.

Developers like to inspect how their dependencies work, and having that, developer seeing import path will never be certain what file it really imports. I wouldn't like to live in such world, would you? :)

I'd be fine in such a world. Often things are compiled using webpack or babel and I have to undo that logic already. This at least would let me keep import('component/lib/foo') the same specifier between source and dist. You could also run the resolution algorithm statically without spinning up a full application (something like node-resolve component/lib/foo might be useful for quick CLI checks?)

I have problems in understanding why that requires packages to be able to reroute it's own paths. I don't think it's relevant to tooling, Actually having such possibility on a table, will make tooling even more complex (for each package set of rules needs to be established)

This problem is not about rewriting specifiers, this problem is being able to generate the new source text for a module. Webpack etc. could generate text: or blob: URLs through rewriting to generate new ESM lazily, or it could generate new file: URLs if it is trying to cache the ESM to disk or being done ahead of time. In all of this situations it is the same aspect as above, where you keep the import specifier the same across all these cases.

@medikoo
Copy link

medikoo commented Feb 2, 2018

Some package such as those created by polymer are actually hard coding paths due to this divergence and it is the reason that NPM has sought to make npm asset

To me it looked more, that polymer expects (and asked) npm to pave the path, which they want to obey (whatever it is, assuming it works well for front-end). And it's npm that also came with a assumption that node-like resolution cannot work for front-end.
It's not clear to me why that was never discussed, maybe they didn't envision that custom path resolution in ESM could be possible in a future (?).

Also I think assets approach may feel natural for those who work with setups where JavaScript is purely about front-end and front-end tooling, and that's it.
However this doesn't seem good where we have full stack JavaScript application, with back-end logic also written in JavaScript, and where there's significant part of codebase shared among front-end and back-end parts

If a specific standard arises that is not limited in the same way as browser main fields, we could certainly standardize.

It could be about configuring some alternative endpoints for specific paths in static manner, and at standard level it can be pretty agnostic.

The way I was envisioning it:
e.g. let's say we have afoo.js module for which we want to define some alternative versions.
Let's assume that this module was imported and after applying regular node resolution logic the foo.js is about to be picked.

Then in case of having in directory's package.json following entry:

"redirects": {
    "foo.js": { "fr": "foo.fr.js" }
}

The resolver will pick foo.fr.js instead of foo.js if route: 'fr' was passed with the options to the resolver. Additionally multiple routes should be supported (e.g. one may want: route: ['fr', 'browser'])

At least that's the direction in which I would go.

This problem is not about rewriting specifiers, this problem is being able to generate the new source text for a module

Ok, so it's about transpilation. Wouldn't it be better to think of path resolution and transpilation as two separate concerns?

Also I'm not sure if approach to transpilation should be standardized in any way (as then "how things works" is getting blurred, JavaScript fatigue started when every popular stack started to rely on some form of transpilation).

I think goal should be that, language on it's own is good enough, that we do not have to transpile, and if necessary, there's WASM.

@bmeck
Copy link
Author

bmeck commented Feb 2, 2018

However this doesn't seem good where we have full stack JavaScript application, with back-end logic also written in JavaScript, and where there's significant part of codebase shared among front-end and back-end parts

Are you saying assets doesn't seem good, or that per package hooks do not seem good?

The resolver will pick foo.fr.js instead of foo.js if route: 'fr' was passed with the options to the resolver. Additionally multiple routes should be supported (e.g. one may want: route: ['fr', 'browser'])

JS spec would need to add a way to pass that data in. Also, this does not work for runtime based redirection such as on process.env or navigator.lang.

Ok, so it's about transpilation. Wouldn't it be better to think of path resolution and transpilation as two separate concerns?

They are inherently linked since any source text generation output needs to be assigned a URL. Having these just be redirects seems simpler than trying to specify all the ways in which code generation may occur.

@medikoo
Copy link

medikoo commented Feb 4, 2018

Are you saying assets doesn't seem good, or that per package hooks do not seem good?

That using two different buckets for front-end and back-end code does not seem good. Both are different programs, but it's one language and best if for both path resolution of dependencies follows same rules.

Also, this does not work for runtime based redirection such as on process.env or navigator.lang.

Why? If those tokens needs to be resolved out of environment, then it simply can be done that way:

resolveUrl(sourcePath, dependencySpecifier, { route: [navigator.lang, 'browser']);

They are inherently linked since any source text generation output needs to be assigned a URL.

Never in our ecosystem path resolution was linked with transpilation. There was never a native support for transpilation hooks. Then why suddenly you state that both are inherently linked. I don't understand (?)

@bmeck
Copy link
Author

bmeck commented Feb 4, 2018

Why? If those tokens needs to be resolved out of environment, then it simply can be done that way

This would require that import syntax in JS be able to provide that { route: [navigator.lang, 'browser']}.

Then why suddenly you state that both are inherently linked. I don't understand (?)

It isn't sudden, all forms of compilation generate a source text that is assigned a URL if they want to be usable from ESM. The means by which these URLs have their body populated (disk, text URL, URL.createObjectURL, etc.) isn't important; but this method of being loadable by ESM is always used by any code generation or instrumentation system does this by either putting things in memory or on disk generally.

@bmeck
Copy link
Author

bmeck commented Feb 4, 2018

I think it might be easier to not think of resolution as purely a path searching thing. It can point to URLs there were generated during the resolution process. That feature cannot be removed unless all ways to reserve URLs that could be resolved is removed from the runtime and left only to the host environment. Even if it is left for the host environment, it would be very hard to prevent all ways to do things like dump a file on disk that points to new URLs using something like export * from ....

@medikoo
Copy link

medikoo commented Feb 5, 2018

This would require that import syntax in JS be able to provide that { route: [navigator.lang, 'browser']}

In my understanding we were talking about custom path resolver, that plays role in resolving paths which were put to import (a feature that may allow e.g. node-like paths resolution), and it doesn't influence in anyway what values we put to import.

e.g. for import _ from './locale.js' custom resolver may resolve ./locale.en.js (on basis of { route: [navigator.lang]} option passed to path resolver function).

It isn't sudden, all forms of compilation generate a source text that is assigned a URL if they want to be usable from ESM

Ok, and here we're talking purely about URL resolution, not about what's at source text (whether it implies transpilation step or not etc). It's two independent things. How can format of source text that's being addressed by given URL have an impact on value of given URL?

I think it might be easier to not think of resolution as purely a path searching thing.

For simplicity I believe it's what we should do. Isn't it just about mapping string token to complete URL? What about KISS and YAGNI?

That feature cannot be removed unless all ways to reserve URLs that could be resolved is removed from the runtime and left only to the host environment. Even if it is left for the host environment

Sorry I have problems understanding that statement. What exactly feature? Why it cannot be removed, can you provide some example, so I understand better?

@bmeck
Copy link
Author

bmeck commented Feb 5, 2018

e.g. for import _ from './locale.js' custom resolver may resolve ./locale.en.js (on basis of { route: [navigator.lang]} option passed to path resolver function).

Correct, which is not available at runtime since ESM resolves prior to any evaluation.

Ok, and here we're talking purely about URL resolution, not about what's at source text (whether it implies transpilation step or not etc). It's two independent things. How can format of source text that's being addressed by given URL have an impact on value of given URL?

I don't understand the question.

For simplicity I believe it's what we should do. Isn't it just about mapping string token to complete URL? What about KISS and YAGNI?

It isn't purely path searching by its very nature.

Isn't it just about mapping string token to complete URL.

No, since it also has to do a variety of other things during resolution like how to form the shape of the target Abstract Module Record which has to be declared with something out of band like MIME or file extension.

Sorry I have problems understanding that statement. What exactly feature? Why it cannot be removed, can you provide some example, so I understand better?

We can't remove URL.createObjectURL which can generate blob: URLs from the web standards. Other things that do work in the web like data: URLs are explicitly blocked in Node due to this idea of generating ESM records being really weird. In node, ensuring that a path lookup never changes over time by people writing to the file system while in path resolution. Node also has things like vm.Module that are landing which also can generate new records at runtime.

We could try to remove some of these, but you can't remove all of them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment