Skip to content

Instantly share code, notes, and snippets.

@isaacs
Last active April 19, 2022 20:57
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save isaacs/cf8684d4c4c4ae720f08fca5d073c441 to your computer and use it in GitHub Desktop.
Save isaacs/cf8684d4c4c4ae720f08fca5d073c441 to your computer and use it in GitHub Desktop.
{"name":"hello-world-bespoke-archive-format","version":"1.0.0","main":"lib/index.js"}
console.log('hello, world!')
# hello-world-bespoke-archive-format
An example of a "hello world" program, but instead of being a tarball, it's
shown in the bespoke package format that npm *SHOULD* have used, instead of
tar.
One can be forgiven for not wanting to reinvent the wheel, but let this be a
lesson that, in fact, some wheels _ought_ to be reinvented, when the
alternative does not roll as easily.
{"package.json":[0,86],"lib/index.js":[86,29],"README.md":[115,379]}}
00000000000000000000000000000070
@isaacs
Copy link
Author

isaacs commented Apr 19, 2022

This is what I should have done instead of using tar for npm packages.

  1. Concatenate all file contents, noting their lengths and offsets. package.json is always first, and JSON.stringify()-ed with no indentation and the "name" and "version" moved to the top of the object, and a \n appended. No other files are mutated in any way.
  2. Create a JSON object of each relative file path as a key, with [start, length] as the value, and append it with a trailing \n.
  3. Append the length of the JSON index as an ASCII decimal integer, zero-padded to 32 characters. Because no index is gonna get bigger than 10^32, or npm would have other problems anyway. A mere 10^20 puts it outside the range of 64 bit ints, so this is pretty future proof.

No index entries for directories, they're just created implicitly. (No empty directories.)

No inclusion of symlinks at all, so everything is either a file or an implicitly created directory.

To unpack, read the size of the index by looking at the last 32 bytes of the file, and casting the string as a number. If it doesn't match /^[0-9]{32}$/, file is bad.

Then read that many bytes + 32 from end, up to the start of the index. Parse that as JSON. If it's not an object, then file is bad.

If the file doesn't start with {"name":", it's bad. If it doesn't match {"name":"<valid name>","version":"<valid semver>"[,}], it's bad.

Check each entry to ensure that the start value is the start+length of some other value. If there's any gaps, or if the final start+length doesn't match the start of the index, or if the package.json doesn't have a start of 0, file is bad.

Then just spit the files out to disk. No need for mode - if a file is a package.json bin, it gets 0o755, otherwise it's 0o644.

The index arrays for each file can be extended to provide a signature for each file, or a synthetic file could be created at the end with the signatures, and another synthetic file for a signature of the set of signatures. It's very extensible in the ways we would have wanted, and not at all extensible in the ways that have caused so many headaches.

You can also do stuff like just read the first line of the file to get the package manifest, if that's all you care about. Or easily create a module system that can pull files out on demand, or not at all.

Tar was a bad choice.

@isaacs
Copy link
Author

isaacs commented Apr 19, 2022

AND AnOThER ThiNG!! If you nest these files within one another, then the parent can re-index the files of the children. So bundleDependencies could have something like:

{"name":"module","version":"1.2.3","bundleDependencies":["dep"],"dependencies":{"dep":"1"}}
require('dep').doSomething() // this is module's index.js
{"name":"dep","version":"1.5.4","main":"index.js"}
exports.doSomething = () => console.log('this is dep')
{"package.json":[0,59],"index.js":[59,100]}
00000000000000000000000000000044
{"package.json":[0,99],"index.js":[99,123],"node_modules/dep.npm":[222,192,{"node_modules/dep/package.json":[222,59],"node_modules/dep/index.js":[281,100]}]}
00000000000000000000000000000159

So any .npm file that shows up in the archive has its entries automatically added to the index. Want to unpack the bundled deps? Easy peasy! They're right there! Want to just do 1 level? Or just 2 levels? Also easy!

Jesus, I might have to sit down and write this some day just to get it out of my system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment