Skip to content

Instantly share code, notes, and snippets.

@guest271314
Created March 30, 2024 16:57
Show Gist options
  • Save guest271314/a4f005d9a6b5b433ae6d6e6c5c6d7595 to your computer and use it in GitHub Desktop.
Save guest271314/a4f005d9a6b5b433ae6d6e6c5c6d7595 to your computer and use it in GitHub Desktop.
Implementing file: protocol support for WHATWG fetch() in Node.js without fs or import

Node.js' fetch() implementation depends on Undici. Undici's fetch() does not support fetching file: protocol. See

I've been using XMLHttpRequest() and fetch() to get file: protocol for quite some time in the browser; both with and without browser extensions, on Chromium-based browsers and Firefox.

On Chromium and Chrome with --allow-file-access-from-files flag and on Firefox we can do

let xhr = new XMLHttpRequest();
xhr.onload = (e) => console.log(xhr.response);
xhr.open("GET", "file:///home/user/bin/nm_host.js");
xhr.send(null);

In a Chromium-based browser (Chrome, Brave, Opera, Edge) we can create an unpacked extension with this in the manifest.json

  "host_permissions": [
    "file:///*"
  ],

then we can fetch file: URL's from the extension - which means we can control fetching file: URL's from any arbitrary URL because we have "externally_connectable" and other ways to

Both deno and bun support file: protocol for fetch().

I found this StackOverflow question How to get a local file via fetch/axios? interesting and specific re not using Node.js' fs module or import/import() to get the file:

For study reasons I need to use some of these network APIs like fetch or axios but to get a LOCAL file, so WITHOUT using fs module, or WITHOUT importing them.

So I dove in to Undici's source

In the source code at /lib/web/fetch/index.js we find

    case 'file:': {
      // For now, unfortunate as it is, file URLs are left as an exercise for the reader.
      // When in doubt, return a network error.
      return Promise.resolve(makeNetworkError('not implemented... yet...'))
    }

When we use file: protocol without any modifications to Undici's fetch() we get this roadmap of errors

TypeError: fetch failed
    at fetch (/node_modules/undici/index.js:109:13) {
  [cause]: TypeError: Invalid URL
      at new URL (node:internal/url:804:36)
      at parseURL (/node_modules/undici/lib/core/util.js:51:11)
      at Object.parseOrigin (/node_modules/undici/lib/core/util.js:117:9)
      at new Pool (/node_modules/undici/lib/dispatcher/pool.js:70:23)
      at Agent.defaultFactory (/node_modules/undici/lib/dispatcher/agent.js:22:7)
      at [dispatch] (/node_modules/undici/lib/dispatcher/agent.js:93:34)
      at Intercept (/node_modules/undici/lib/interceptor/redirect-interceptor.js:11:16)
      at [Intercepted Dispatch] (/node_modules/undici/lib/dispatcher/dispatcher-base.js:158:12)
      at Agent.dispatch (/node_modules/undici/lib/dispatcher/dispatcher-base.js:179:40)
      at /node_modules/undici/lib/web/fetch/index.js:2079:51 {
    code: 'ERR_INVALID_URL',
    input: 'null'
  }
}

Debugging and modifying each line from the respective files is tedious, yet educational. I modified parseURL(), parseOrigin() in undici/lib/core/util.js and [kInterceptedDispatch] in DispatcherBase.

That leaves from undici/lib/web/fetch/index.js

  [cause]: Error: connect ECONNREFUSED 127.0.0.1:80
      at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1606:16) 
  {
    errno: -111,
    code: 'ECONNREFUSED',
    syscall: 'connect',
    address: '127.0.0.1',
    port: 80
  }

Looks like Undici's fetch() is using TCPConnectWrap for fetch() which I have not dove into, yet. We have some breadcrumbs to follow in

We'll set that branch aside for the moment and explore other approaches for the time being.

My own programming process usually involves concurrent branches of different approaches to solve an issue or achieve a requirement.

If we did not have the restriction to not use fs or Ecmascript static import or dynamic import() we could just use either of those modules directly, by short circuiting Undici's fetch() with something like this in undici/lib/web/fetch/index.js at around #L129 to just short circuit the process when the URL is file: where fs module usage will be unobservable to the end-user

  const p = createDeferredPromise()
  const url = new URL(input);
  if (url.protocol === "file:") {
    return import("node:fs").then((fs) => {
      p.resolve(new Response(fs.readFileSync(url)));
      return p.promise;
    })       
  }

Ecmascript Modules import() makes a network request. import/import() works for file: URL's, e.g.,

await import(import.meta.resolve("file:///home/user/bin/exports.js"));

I find it interesting we can successfully do the above, but Undici's fetch() for the same URL throws.

While I was modifying Undici's fetch() source code I was also thinking about other ways to achieve the requirement of not using Node.js' fs or Ecmascript Modules without using Undici's fetch().

We know curl supports fetching file: URL's. curl is also portable. So we can fetch and build curl just for this purpose

git clone https://github.com/curl/curl.git
cd curl
autoreconf -fi
# Disable everything we are not using here
LDFLAGS="-static" ./configure --disable-alt-svc --disable-ares --disable-cookies --disable-basic-auth --disable-bearer-auth --disable-digest-auth --disable-kerberos-auth --disable-negotiate-auth --disable-aws --disable-dateparse --disable-dnsshuffle --disable-doh --disable-form-api --disable-get-easy-options --disable-hsts --disable-http-auth --disable-ipv6 --disable-libcurl-option --disable-manual --disable-ntlm --disable-ntlm-wb --disable-progress-meter --disable-proxy --disable-pthreads  --disable-socketpair --disable-threaded-resolver --disable-tls-srp --disable-unix-sockets --disable-versioned-symbols --without-brotli --without-libpsl --without-nghttp2 --without-ngtcp2 --without-zstd --without-libidn2 --without-librtmp --without-ssl --without-zlib --enable-static --prefix=$HOME/bin```
make -j $(nproc)
make install

then use curl for out file: URL's

import { spawn } from "node:child_process";
import { Duplex } from "node:stream";

async function fetchFile(path) {
  // https://github.com/chcunningham/atomics-post-message/blob/main/server.js
  const mimeTypes = {
    "html": "text/html",
    "jpeg": "image/jpeg",
    "jpg": "image/jpeg",
    "png": "image/png",
    "js": "text/javascript",
    "wasm": "application/wasm",
    "css": "text/css",
  };
  try {
    const { stdout, stderr } = spawn("./bin/curl", ["-NSs", path]);
    if (stderr) {
      // Handle file not found: "curl: (37) Couldn't open file /home/user/bin/nm_hosts.js\n"
      const err = await new Response(Duplex.toWeb(stderr).readable).text();
      if (err) {
        throw err;
      }
    }
    return new Response(Duplex.toWeb(stdout).readable, {
      headers: {
        "Content-Type": mimeTypes[path.split(".").pop()] || "text/plain",
      },
    });
  } catch (e) {
    throw e;
  }
}

export { fetchFile };

In our import module

import { fetchFile } from "./fetchFile.js";

const file = "file:///home/user/bin/nm_host.js";

fetchFile(file)
.then((r) => {
  console.log(r.headers.get("content-type"));
  return r.text();
})
.then(console.log)
.catch((err) => console.error({ err }));

curl might be a little heavy just to fetch from file: protocol in JavaScript files that use node and fetch(). The above is not a full implementation of WHATWG Fetch, either. It works within the restrictions of the question posted on StackOverflow.

@guest271314
Copy link
Author

Using dd command and GNU Coreutils head, respectively, to get the file

  const url = new URL(path);
  try {
     const { stdout, stderr } = spawn("dd", [`if=${url.pathname}`, `count=${1024**2}`, "status=none"]);
  const url = new URL(path);
  try {
    const { stdout, stderr } = spawn("head", ["-c", 1024**2, url.pathname]);

@guest271314
Copy link
Author

Using QuickJS to read file

  const url = new URL(path);
  try {
    const { stdout, stderr } = spawn("./qjs", ["--std", "readFile.js", url.pathname]);

readFile.js

function readFile([, path] = scriptArgs) {
  try {
    const size = 4096;
    const data = new Uint8Array(size);
    const err = { errno: 0 };
    const pipe = std.open(
      path,
      "r",
      err,
    );
    if (err.errno !== 0) {
      throw `${std.strerror(err.errno)}: ${path}`;
    }
    let n = 0;
    while ((n = pipe.read(data.buffer, 0, data.length))) {
      std.out.write(data.buffer, 0, n);
      std.out.flush();
      pipe.flush();
      std.gc();
    }
  } catch (e) {
    std.out.puts(e);
    std.exit(1);
  }
}

readFile();

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment