npm 4, npm 5
こんばんは! Hello! Thank you very much for having me, and thank you very much to Yosuke Furakawa for inviting me to speak to you all! Tokyo is one of my favorite places in the world. I'm delighted to have this opportunity to connect with the Japanese Node community.
My name is Forrest Norvell, and I am the manager for the command-line interface team at npm, Inc. There are two more developers on the team – Katerina Marchán, who some of you met at Nodefest last year, and Rebecca Turner. When you run npm, and it is either slow, or doesn't install your optional dependencies correctly, or does something else surprising, it's our fault. We are very sorry. We do what we can. Unfortunately, as you probably already know, computers. Shikata ga nai.
The npm, Inc. CLI team
Tonight I'm going to tell you some things about how my team works, some decisions we've made, today's release of npm 4, and what we plan to do with npm 5. To do this I'll need to talk about the history of the
npm shrinkwrap command and how package manager lock files have developed over time. I'll then explain what my team plans to do to make shrinkwrap more useful. Finally, I'll spend some time talking about how the npm and Node release schedules work together.
I'm going to assume that if you're here, you have experience with npm. Many people do. npm's COO, Laurie Voss, loves to talk about all of the big numbers surrounding npm: millions of users, over 350,000 packages on npm's own registry, billions of packages downloaded a month, tens of thousands of active publishers. A lot of people depend on npm for their projects, which is a responsibility that all of us take very seriously. We are the stewards of an important commons.
It is important for my team to move carefully because we are relatively small. People generally react in one of two ways when they discover that there are three developers on the CLI team:
- "npm is a company?"
- "Only three of you?"
Anyone who has seen the npm issue tracker probably leans towards the second response.
(At this point, I want to take a moment to thank Daijiro Wachi for all that he has contributed to npm, and especially for the help that he gives to the Japanese developer community with npm. All of npm values his contributions tremendously. Thank you, watildeさん!)
Because we are a small team, and our community is big, we must be practical when thinking about our ability to convince our users to move one way or another. Developers often don't upgrade from the version of npm bundled with the version of Node they're using. They often aren't very quick to upgrade to newer versions of Node, either. Sometimes it seems like users only upgrade npm independently of Node when they run into bugs or want new features.
This means that the CLI team cares a great deal about backwards compatibility. We continued to support npm on Node 0.8 long after most other popular packages had moved on (in part to support some large companies here in Japan), and are only now dropping support for Node 0.10 and 0.12. We've never made a change that would prevent a new version of npm from being able to use an old
package.json file. At least, not that we've known about in advance. A lot of features have been added to npm over the years, but not at the cost of leaving behind existing packages.
npm and SemVer
This is significant because npm was also among the first tools to bet heavily on SemVer. I could – and probably will, someday – give an entire talk about SemVer, and how it works or doesn't work in practice. For now, I'll say that the important thing about SemVer is that it includes the idea of breaking changes in the version number itself.
This leads to an interesting question: what counts as a breaking change in a package manager? The simple answer is that when the same command produces two different results in two different versions of a command-line utility, that's a breaking change.
Let me make this more real with an example. npm was at major version 1 for several years. Many features were added during that time, but none of them changed how npm worked in a way that would surprise ordinary developers using it. Many people guess that the reason npm 2 got a new major version was the addition of scopes and support for private packages.
That's not the reason for the new major version, though! We made a small, but backwards-incompatible change to the way
npm run-script handled its arguments, and so that meant it was time for us to increase the major version to 2. It was a relatively small step from version 1 to 2. Remember this.
npm 2 && npm 3
npm 3, on the other hand, was a very big step. It was biggest change to the internals of npm since the very earliest days of the tool. The npm installer is one of the most complicated things I've ever worked on. It supports a whole lot of use cases, and its pieces are tough to pull apart. My colleague Rebecca did a great job making it easier to understand what the installer was doing and why. She also made it a lot simpler and fixed a lot of bugs.
In the process, she also got the installer to produce maximally flat installs. This was actually a side effect of cleaning up the code. Doing this made life much better for Windows developers, who frequently run into path length limitations. It also saves a bunch of disk space. Rebecca also made a number of small changes to how npm works with shrinkwrapped packages, but we'll talk more about that in a minute.
npm 3 was a complex project, and it had bugs. The community needed some time to catch up. At the same time, it wasn't released in time to be included in Node 4, the first version of Node to be designated a long-term support release. In the interest of supporting as much of the community as possible, the npm CLI team decided to continue to put out new releases of npm 2 and npm 3 side-by-side.
Our goal all along was to get everyone moved over to npm 3 as quickly as possible. However, some developers preferred the old style of nested installs (and also liked that npm 2 sometimes installs faster than npm 3), so we ended up supporting both. Two different major versions of the tool! It's Python 2 and Python 3 all over again! This isn't something our little team can support well long term, so we need a way past this.
While we don't get as many contributions from our user community as Node core, the community does contribute a lot. (Again, I'd like to especially recognize watildeさん's contributions here. He's one of the most prolific contributors to the project who is not employed by npm, Inc., and he has significantly improved the quality of npm's user experience over the last couple years.) We try to get community patches out as quickly as we can, but when those are breaking changes, we have to find a good opportunity to get them out.
We recently received a pull request from Anna Henningsen (addaleax on GitHub) that makes a subtle but important improvement to how the path is set up for package scripts. Bear with me for a moment as I explain, because this is a good example of the kinds of tradeoffs that our team has to make.
An important question to answer when running package scripts is: do we run those scripts using the first Node in your path, or do we use the Node that's being used to run npm? The first works better for people using language version managers like nvm, rbenv, or Python's virtualenv, but the second allows developers to override the default version of Node in your environment, which can be important for situations in which there are multiple versions of Node installed on a single host. We've changed this behavior a few times, each time unintentionally breaking things for some of our users. Anna made it configurable, and I decided to, one last time, flip the default so as to fix things for rbenv and virtualenv users.
Since we'd been meaning to do a new major release for a while, we decided that now was a good time, so we looked at the issues we'd labeled "breaking" and figured out what made sense to include in a SemVer major release. While there are more than a few changes on the list, none of them are very risky, and the overall impact is much less than npm 3's was:
- Many users don't understand why packages with
"prepublish"scripts defined in
package.jsonwill see those scripts run when they run
npm installfor the package. It turns out there's a pretty good reason for this – transpiler users need a way to prepare their transpiled code for use on initial install – but the way that was implemented was very, very confusing. We have a whole plan to fix this, which requires deprecations and tweaks spread out over many releases.
- watildeさん submitted a patch a while ago that causes
npm outdatedto return a non-zero exit code when there are no packages to update, which is useful for scripts, and also follows CLI conventions.
- Anna's change, which I discussed above.
- Once upon a time, you could configure npm to run a package's tests upon install to verify that the installation was correct. This feature was called "npat". It was a good, or at least interesting, idea. However, many people use files or ignores to strip test files and fixtures out of their packages, even though the
"test"script is still defined in
package.json. This breaks the npm feature, which also is poorly documented and almost entirely unused. As a result, we're removing it.
- Starting with
npm@2, we've been replacing the
npm tagcommand with
npm dist-tagfor managing release tags. We're finally removing the long-deprecated
npm tagin npm 4.
- Before npm 4, if you published a package with an
npm-shrinkwrap.jsonthat you'd selectively edited so that only a few nested dependencies had their versions pinned, npm would use
package.jsonto fill in the missing dependencies. Hapi relied on this feature, but it complicated the shrinkwrap installation considerably.
- Finally, as a change that fixes something that had long been broken, Kat and our colleague Aria Stewart have put together a fix for npm's long-struggling command-line package search. The new search implementation streams both from the network and disk, and is both much faster to return results and uses orders of magnitude less memory.
The absence of big changes is by design. The versions of npm 3 we've been releasing recently are much more reliable (and faster!) than 3.0.0 was, and our hope is that this small set of (real) breaking changes and a new major version number will be enough of a fresh coat of paint to overcome the remaining resistance to upgrading to a new version of npm.
You'll note that I've mentioned npm shrinkwrap a few times. How many of you have used it?
We – both the CLI team, and the broader npm community – have put a lot of work over the last couple years into improving shrinkwrap. It is much better to use now than it was a year ago. However, it still has enough problems, and is far enough from being what it should be, that we've decided to devote an entire major release cycle to improving it.
Before I get into what we're going to do, though, let me talk a little bit about package managers and lock files, and a little about why shrinkwrap is the way that it is.
Why are lock files useful and often necessary? The promise of SemVer was that by making package version numbers mean something, users could confidently install updates to their dependencies as they were released, getting security fixes and new features without having to worry about API conflicts or other risky changes. However, anyone who's used a semver-based package manager like npm for a while knows that this has one very obvious cost, which is that builds are frequently not reproducible – if you pull down a package and install it at one time, and a coworker pulls down that same package and installs it at another time, the two of you may end up with very different set of packages installed.
You should be able to update the set of packages according to the rules of SemVer whenever you want, but between those updates, everyone running install should get the same sets of packages. This is what lock files enable. They can simplify both package lifecycle management and application deployment. However, those are two pretty different use cases, so let's talk a little about how lock files are used in practice before we move on.
Anyone who's done extensive Rails development knows the problem that Bundler was meant to solve: you could try to dictate what versions of your gems were installed system wide (or via gemsets, if using a version manager like rvm or rbenv), or you could put everything in a vendor directory and check that into Git. Bundler replaced a whole set of tricky, unpleasant processes with two files:
Gemfile.lock. I'm not sure what the original intentions were, but it very rapidly became common for teams to check
Gemfile.lock into version control, so that everybody working on a Rails application was running the exact same stack.
It's worth pointing out two things about Ruby on Rails development that affected the design of Bundler:
- Any given Ruby application can only use one version of a package. It does not have Node's nested dependencies.
- Bundler's sweet spot is Rails apps, which is to say fully-fledged applications, rather than libraries.
A lot of the same people who were involved in the design and development of Bundler are also involved with Rust (Yehuda Katz's name comes up a lot in these conversations), so it's not a surprise that when the time came to develop a package manager for Rust, it would have a very similar lock file.
Cargo adds a couple new considerations:
- While Rust is a natively compiled, statically linked language, it supports having multiple versions of a dependency in the same build.
- Because Rust is natively compiled, and because it's typically distributed as source, it's necessary to distribute the lock file if you're to have any hope of reproducible builds.
These added wrinkles haven't really complicated Cargo that much, but it's safe to say that Cargo is still a work in progress that hasn't completely settled down.
package.json as the package manifest. However, yarn is meant to use a Cargo-style lock file (that shares nothing in common with npm shrinkwrap files – the two are completely incompatible).
As with Cargo and Bundler, it has a few aspects of its design it's worth keeping in mind:
- Locked dependencies can be nested or flat, as yarn will install Bower components, which are always flat, as well as Node-style nested dependencies.
- yarn's lock file is meant to be checked in to Git, but isn't published to the npm registry. This, to me, says that it's squarely targeted at application, rather than library or tool, development. We'll return to this in a bit.
We make different kinds of things
Looking at Bundler, Cargo, and yarn, a theme that emerges is that there are different kinds of things you can build, and they use lock files in different ways.
- There are libraries, that are intended to be dependencies of other things. I think in many ways, libraries are what SemVer was designed for – what you care about is the structural interface remaining stable, not what precise version you're using.
- There are command-line tools. Tools consume libraries, but are often themselves dependencies, especially in the current world where most build tools are development dependencies of applications. npm itself is a great example of one of these, being bundled with things as diverse as the Atom text editor and ember-cli.
- There are applications. Applications can be web apps or Electron-based GUIs, and can include both Node modules and complicated front-end build process. Applications are primarily consumers of other dependencies (both libraries and tools), and being able to reproduce builds is essential to having the confidence to develop and deploy applications safely. Applications love lock files.
npm shrinkwrap and backwards compatibility
Set next to all of that, shrinkwrap, even in its current incarnation, looks very limited. In its original form, it was a very simple patch designed to solve some problems Joyent was having with deploying its own applications. It was intended to be used as a final preflight step before deploying a finished Node application, and had almost no integration with the rest of npm. Most of the changes we've made to shrinkwrap over the last couple years have been meant to make it feel more like a first-class part of the CLI, but it's not there yet.
It's very important to remember that shrinkwrap was never meant to manage your package over its entire lifetime. It was purely a deployment tool. Also, it's tied to a lot of npm's other historical baggage, like the fact that it was fairly late in npm's development that the registry stopped allowing you to publish over an existing version of a package with a different package tarball without changing its version. This is why shrinkwrap installs use the same cache logic as the rest of npm, where even a cached install requires a request to the registry to ensure that the package tarballs haven't been changed. (Some third-party registry software suites rely on this behavior to implement things like Maven-style package "snapshots".)
Also, because shrinkwrap was developed in pieces in response to very specific needs, it has some very odd aspects of its design. Perhaps the most notable is that a fully-resolved dependency in a shrinkwrap file is a fully-qualified HTTP or HTTPS URL, which has the side effect of tying the shrinkwrapped dependency to a specific registry, rather than some more abstract notion of version or the package's contents. A whole constellation of third-party tools have been developed to work around this. It would be nice if these tools weren't required.
Finally, it's worth mentioning that neither
package.json nor shrinkwrap files have a versioned format. If we want to make changes to how those files work without breaking everybody's use cases, we must proceed very carefully.
npm 5 will make this better!
The shrinkwrap file as source of truth
Perhaps the most significant change we want to make to improve this situation is to make
npm-shrinkwrap.json the source of truth for installing. If a shrinkwrap file exists, that's what npm looks at, rather than looking at all of the shrinkwrap file,
package.json, and the current state of
node_modules. This dramatically simplifies the process of installing shrinkwrapped applications.
The removal of support for partial shrinkwrap files in npm 4 is an important prerequisite for this changes, as are the many fixes landed in npm 3 intended to improve the handling of
optionalDependencies in shrinkwrap files.
Another important thing is to make shrinkwrap generation idempotent. Explaining why this isn't already the case requires getting into some very specific details of the implementation that aren't even interesting if you're on the team. Fortunately, this is something that Rebecca already improved in npm 3, and we know what we need to do to finish it all the way.
If we want
npm-shrinkwrap.json to be authoritative, then it will need more of the metadata currently included in the
package.json files installed into
node_modules, so that npm can reproduce the same dependency trees across installs. The biggest piece of this is the shasum of the package before it was installed, but we'll get to that more in a moment.
Once the file includes the relevant data, it should be possible to use additional tools to manipulate it. The shrinkwrap file becomes, in effect, an API to use for interacting with dependency trees. The flexibility of shrinkwrap files has, oddly, become an asset, as the large and thriving ecosystem of tools that manipulate shrinkwrap files will attest. We'd like to bless that ecosystem as much as possible, while removing the necessity for most developers to resort to external tools just to produce reproducible builds.
clean up edge cases
While shrinkwrap is in far better shape than it was a few years ago, there's still quite a bit of work to do around the edges. To behave the way that users of lock files in other package managers expect,
npm update should update a shrinkwrap file, if one exists, by default. Handling of Git dependencies should be more consistent. Local dependencies really don't work well with the shrinkwrap model, and that should be clearer.
the content-addressable cache
Finally, the story around performance and network bandwidth could be improved considerably if we finally implemented the content-addressable cache I've been trying to add to npm since 2014. If we save package tarball shasums into the shrinkwrap file, and we know package tarballs with those shasums are in the cache (or Git repos with those commit hashes), there's no further need to talk to a registry or Git server.
Also, it allows the end user to define what constitutes a reproducible build – is it all of the dependencies meeting a pinned version? Is it the source URLs of all the packages? Is it the package shasum? We intend to make that information configurable, and allow users to, at least somewhat, define this for themselves.
getting npm 5 out the door
Node LTS vs npm LTS
The Node project has a long term support plan. Some development teams need guarantees that the version of Node they deploy into production will continue to receive security and critical functional improvements for a long time. Node's LTS plan says that Node's dependencies can't undergo major changes over the course of an LTS channel's lifetime. For the purposes of this discussion, npm is a dependency like any other.
At any one time, there are two supported Node LTS releases in addition to the current release. For a while, this meant that there were versions of npm 1, npm 2, and npm 3 released at the same time with different versions of Node. This would also continue under Node 6 LTS, which will have npm 3, and Node 7, which may yet include npm 4.
Our experience with allowing new features into npm 2 after the release of npm 3 is that it slowed us down. The team isn't really big enough to handle the increased overhead of backporting, testing, and releasing multiple lines. Doing something similar for three different major versions of npm would be even harder. So, as of npm 4, the CLI team is only supporting npm 4, with the exception of critical security fixes.
We've begun discussing internally what our own idea of a long term support version of npm would look like (and what it might cost, because it would need to pay for itself), but it's very early days yet. If you have thoughts or feelings about this, we'd love to hear them.
will Node 8 include npm 5?
The Node development team follows a fixed release schedule that in turn follows the Google V8 release schedule. So far, this has meant two major versions a year, in the spring and the fall.
npm does things differently. We release a new version of the CLI regularly. We've been experimenting with changing the time between releases in an effort to minimize release overhead, and for the time being, we've settled on every two weeks. Doing this allows us to ship community contributions and our own changes when they're ready (but not before).
Our goal is to get npm 5 out sometime in the first half of next year. We hope to specify how we want shrinkwrap to work as a whole within the next month or so. As we finish proposals for the individual changes we want to make to shrinkwrap, we'll put them on our issue tracker for community review and approval. After that, we'll implement each piece, which in some cases may involve some large architectural changes to the code. When we've finished implementing all of those changes, we'll release all of those changes as npm 5.
I hope this makes it clearer why npm 5 may or may not be released in time to be included in Node 8, which will be the next Node LTS release. We're still trying to figure out a way to make these things line up better (as are people on the Node project). Our team is small and tries to be responsive, which gets in the way of following a strict release schedule. If you have ideas about how we could do this better, let us know.
In summary, I'd like to thank all of you for your patience. Also, thank you once more for having me here to speak. Please find me later if you have questions or comments! I'm here in Tokyo for another week, and am interested to hear your thoughts about my work.