Use cases and requirements for web packages
Below is a discussion of use cases that I'd like to solve with a package format for the web. There's also a list of requirements on the final solution.
Finally a list of use cases that I don't think we should aim to solve.
Desired use cases
Use case A: Sharing a single package file rather than a pile of individual files
Sharing files through email attachments is a very common thing for people to do. However today people generally choose to share files using other formats than ones designed for the web.
Sharing slides is much easier using PowerPoint or keynote than using HTML. The reason for this is that for PowerPoint you can simply send a single file which the recipient can simply open in PowerPoint or OpenOffice.
However if you share a HTML slide with someone, you likely will have to first zip up all the dependent stylesheet, images and JS. Then attach the resulting zip. Then ask the recipient to unpack the zip. Only then can the slides be opened in a browser.
And to make matters worse, many browsers put security restrictions on HTML files opened from file:// in order to prevent a locally saved HTML file from reading arbitrary data from the user's harddrive. So the resulting unpackaged files might not work properly.
This problem isn't HTML specific, but rather is a problem any time that you have a file format on the web which uses sub-resources. This means that it applies to most file formats used on the web today. For example CSS, SVG and HTML all commonly contain declarative links to other resources. The same is likely to be true for JS as well soon.
We could say that people should send, or save, URLs rather than files when dealing with HTML. However having to upload files to a server requires a fair amount of technical skill. And if you are working on a draft version of a project, you might not want to have to host all draft versions on different URLs and worry about password protecting them if they should not be accessible to the outside world.
Also, using URLs is less dependable. If I email a set of slides to a friend by uploading them to a third party server and then sending a URL, then what's to say that that server will still be around three years from now if my friend wants to pull up that old presentation.
There's also the problem that emails and attachments sync fine and are available offline in a good email client. However email clients can't dependably sync resources that are linked to using URLs in the email.
It would be great if I could take a HTML file and its dependent resources and package up as a single file. I could then send these to a friend who could directly open the attachment in a browser.
The sharing could be done over email, bluetooth, USB cables, 3.5" floppies or any other of the myriad of ways that exist for sharing files.
Use case B: Upload a received package to a website without having to unpack it
This builds on use case A. If I receive a package which contains a HTML file and its dependencies, I should be able to upload that package unmodified to a web server and then link to it.
Asking people to unpackage it would just mean making it harder to publish content to the web.
A web-based email reader could even use
<iframe sandbox> to allow viewing the attachment directly in the web-based reader.
Use case C: Package should be useable as a resource
This is similar to A above, but applied to resources within a page rather than the page itself.
Consider the developers of a JS library, which during development of a v2 of their library end up needing to import a JS module. It would be great if it was possible to create a package which bundles the library and the module into a single file. This file could then be sent to other developers, published to the website or otherwise treated as the v1 standalone JS file.
Likewise if you author an SVG image and want it to include some bitmap images or some CSS, you are forced to go through the painful process of inlining these. It would be great if you could instead use a generic packing mechanism and handle the SVG image the same way, whether it happens to be expressible in single SVG file or not.
Any website today that allows uploading "avatar images" will work great with PNG and JPEG. It also works great to use a standalone SVG file. However if that SVG depends on an external stylesheet or embeds a bitmap image, then this will not work as well since such websites don't support avatar images that are made up of multiple files. I.e. they just expect a single file, save it to a URL and then point an
<img> to it.
But if you could package up the SVG file and its dependencies into a single package file, you could then upload that package as your avatar image. As long as browsers support rendering the package when an
<img> is pointed to it, it will work fine no matter what dependencies the SVG file has.
For the case of SVG there are actually even stronger reasons. Some browser refuse to allow SVG images that are opened as an image rather than a document (i.e. from
<img> or CSS) to link to out-of-file resources. This restriction exists since otherwise an author could track anyone who uses the image. However no such restriction would need to exist for in-package resources.
Use case D: Saving static snapshot of web page
Currently saving a HTML page in a browser is quite painful process. Browsers generate a HTML file as well as a directory containing all dependent resources. This forces the user to deal with all of these files separately.
Windows even has special features in its file handling UI which tries to detect if the user moves the HTML file, and then tries to also move the directory with the dependent resources to the same location. Of course, this only handles some situations. If the user attach the HTML file to an email, the dependent files aren't automatically attached. I'm not sure if it tries to handle the situation when the user delete the HTML file, but there doesn't seem to be any good solutions here since windows either risks deleting files that the user did not expect to delete, or leave behind garbage that the user doesn't care about.
But what browsers could do is strip all scripts and at least get a static snapshot of what the user was seeing. Enabling such a snapshot to be shared with other users would be appreciated by many users.
The following is a list of desired properties for the final technical solution to the use cases above.
Requirement E: Don't break URLs
URLs are one of the most important aspects of the architecture of the web. We should not break that.
It is important that browser features that rely on urls continue to work. For example right clicking an image, copying the url of the image and then emailing the url to a friend should continue to work.
This is of course the 'u' in 'url', universal. That a resource can be accessed through a given url from anywhere and not just within a given context.
This is an area where many attempts to create "packaged apps" have failed. In the desire to solve A above has resulted various solutions with URLs that only work locally for a single machine. Sending the URL of a page inside the packed app to someone else has been impossible, even if the package app was originally downloaded from a webserver.
Requirement F: Enable intra-package URLs
It must be possible to within a package link to another resource within the same package.
This link needs to work even if the package is moved from one location to another. So for example the same package should work both when a package lives on a user's filesystem and when it is uploaded to a webserver. Likewise if the package is moved from one location on a webserver to another location on the webserver.
This is not to say that all URLs inside the package needs to keep working if the package is moved. It just needs to be possible to write URLs to other resources in the package, such that they keep working even if the package is moved.
Requirement G: Support "dumb" servers
Packaging should work well even in scenarios where the web developer is not able to install arbitrary software on the webserver.
Developers are often only able to upload files, and not dynamically generate content or inject custom headers. Possibly even not set arbitrary mimetypes.
Requirement H: Graceful degradation on non-package-supporting clients
It needs to be possible to deploy content that uses the packaging solution in a way that works on old clients, in order to make adoption of the solution viable. So for example, by manually unpacking a package (or having a smart server that dynamically unpacks the package), a site can successfully respond to requests from both old and new clients.
This applies not just to clients that are browsers. Supporting non-browser clients like wget, which does not have package support, is also important.
Requirement I: Don't force servers to forever duplicate data
In the short term, to simultaneously support requirements G and H, servers will need to host both packaged file as well as each individual resource in the package as a separate file.
This is fine, however it would be good if the long term goal is that servers should not need to do this. It would be good if servers could choose not host the unpackaged files once marketshare for clients with packaging support gets large enough.
Requirement K: Support HTTP metadata for resources in the package
A packaging format should be able to store metadata including HTTP response information such as Content-Type.
Requirement L: Support packaging both compressed and uncompressed resources
It's important to enable storing, for example, a large JS library compressed in the package in order to minimize the size of the package. However it would be wasteful to require a JPEG image be compressed.
It should also be possible to store both a compressed JS library, and an uncompressed JPEG file, in the same package.
Requirement M: Packages should be drop-in replacement for stand-alone files
This applies to use cases A and C above.
If the user has a package containing a HTML-based slide deck saved on their local filesystem, it should be possible to open that slide deck in a browser, the same way that they can open a plain HTML file. I.e. double-clicking on the package, dragging it to a browser window, or opening the package through the browsers "Open File..." menu item should all work assuming the appropriate file associations have been registered.
Likewise if that package is uploaded to a webserver, it should be possible to navigate the user to that slide deck the same way you could if the deck had just been a standalone HTML file. Nothing more than a plain
<a href="..."> should be needed.
This applies to resources as well. If an SVG file embeds a bitmap image, it should be possible to create a package that contains the SVG file and the bitmap image and then link a
<img> directly to that package.
In particular in the latter case it is important that URL used in the
<img> is just the URL of the package itself. HTML authoring tools that lets the user select image to insert into a page will generate a simple URL to the file that the user selects. Likewise in the avatar image scenario descried in use case C, the website will link to the package file itself, rather than generate a packaging-specific URL. If other markup is needed, or if special URL postfixes or prefixes are needed, then that would break a lot of existing URL handling in existing software.
It would also be very helpful if this was also the case that linking to a HTML file which has been packaged with its dependent resources can be done by linking to the package itself. This might be needed in order to keep URL handling consistent. Consider also that a HTML file can be a sub-resource of another HTML file through the use of
Requirement N: Enable packaging multiple HTML pages into a single package
It should be possible to create a web "app" which consists of multiple HTML pages, and their dependent resources, into a single package file. It should then be possible to navigate between the HTML files inside the package. I.e. a package shouldn't be limited to only one page and its resources, but should also allow multiple pages to be included in the package.
Requirement O: Backwards compatibility with the web as it exists today
"Don't break the web". Browsers won't implement something that breaks too much existing content. A few pages here and there is usually ok, but at some point a solution just won't work, no matter how theoretically beautiful it might be.
Requirement P: Acceptable performance when loading over http
In order for use case B to be met, it needs to be possible to load the packaged file from a HTTP server with acceptable performance.
Effectively this likely means that the package needs to be streamable, but other requirements might also be needed. We should let data drive the exact requirements here.
Please also note the non-goal related to performance below.
Requirement Q: Acceptable performance when loading locally saved packages
In order for use case A to be met, it needs to be possible to load a package saved on the users local filesystem with acceptable performance. The situation here is made somewhat more complex by that users often have different performance expectations for locally saved data.
This might mean that we need to enable clients to jump directly to the position in a package where a given resource is located. I.e. we might need to support some form of index indicated where all packaged resources are located. We should let data drive the exact requirements here too.
Please also note the non-goal related to performance below.
There are some use cases that are often raised when discussing packages but which I am not interested in focusing on for the initial version of web packages.
To be clear, I'm not suggesting actively preventing these use cases from being addressed. I'm simply proposing not focusing on these use cases for the initial release. So if a feature is needed only to meet one of these use cases, I propose to leave out that feature for now.
If we happen to address these use cases anyway, then all the better.
Performance is often raised as a use case for packages. I.e. to increase the performance of an existing website by putting resources into a package. Unfortunately performance is a bit more complex and controversial.
Part of the reason for this is simply that packages can be as much of a performance improvement as a performance hinderance. Part of of the reason is that HTTP/2 was recently finished as is currently being rolled out.
Packages make it impossible for the client to prioritize in what order resources should be downloaded. So a browser can't choose to download a script which is currently blocking rendering before it download image which has been marked as low priority. The resources will be downloaded in the order that they were packaged no matter what.
Also packages often cause more resources than needed to be downloaded. Any resource in the package will get downloaded, whether the current page needs it or not. This is especially bad if resources that the page does need is located later in the package. In that case the browser will sit waiting for unneeded resources to be downloaded until it receives the desired resource.
Additionally packages interact poorly with caching. When a resource in a package is updated, effectively all resources in the package gets invalidated.
But there are certainly situations when packages can improve performance. However HTTP/2 aims to solve those same situations. So it seems better to wait until we have more experience with HTTP/2 before we focus too much solving those problems also with packages.
This is not to say that performance of using packaged content does not matter. Like any other feature added to the web, if performance of the feature is too bad authors will not use it. See requirements P and Q above.
However the goal of this release of packages should not be to enable authors to put resources into packages for the sole purpose of increasing performance.
Signing a group of resources that make up an application or a document is often done by packaging the resources up and signing the resulting package.
But signing is not a end-goal in itself. Signing is done in order to provide some other property. For example to check that a given party is actually the author of a resource. Or to verify that a set of resources has been reviewed by a party and thus can be given additional capabilities or access to certain data.
Before talking about how to sign it seems better to start talking about what end goals we want to archive using the signing, who is expected to do the signing, and what properties we want to assign a signed resource. Before doing that it seems hard to verify that the way we'd sign the resource actually enables us to do what we're hoping to do.
Currently there is very little agreement on what such goals are though. That seems like a better place to start discussions before getting into technical details of signing formats.
Further, it's quite possible that we can archive signing without using packages, depending on what the goals of signing are.