public
Last active

  • Download Gist
respimg.md
Markdown

Adaptive Image Element

Author:

Mat Marquis

Status of this Document

This is an unofficial draft spec, not formally endorsed by the WHATWG. It is suitable only for reviewing the details of the proposed element.

Table of Contents

  1. Introduction
  2. Implementation Examples
    1. Sample Markup Pattern
    2. Functional Polyfill
  3. Example Use Cases
    1. Flexible Layouts
    2. High-Resolution Displays
  4. Requirements
  5. Prior Discussion

1. Introduction

Our goal is a markup-based means of delivering alternate image sources based on device capabilities, to prevent wasted bandwidth and optimize display for both screen and print.

The idea is to use the video tag’s markup pattern as the inspiration, as it’s specced to allow the use of media queries within attributes on its source elements and reliably displays the markup inside the tag in any browser that doesn’t recognize it. Through use of media attributes we would not only be able to reduce wasteful image requests for the sake of users with smaller displays, but we would be able tailor our images’ resolutions for users with high-res displays or for print.

Much of the surrounding discussion has taken place publicly, in the W3C’s Responsive Images Community Group.

2. Implementation Examples

Any combination of existing media queries can be used to determine the appropriate picture source, through a media attribute on source elements. This is identical to the specced behavior of media attributes on the video’s source elements, as outlined here. Any implementation of the picture tag should allow for the inclusion of fallback markup that is completely ignored by any UA that supports picture, and is only displayed in browsers that do not recognize the new tag. Note that older browsers can be polyfilled (see section 3.2) with behavior similar to a native implementation.

2.1. Sample Markup Pattern

<picture alt="The alt attribute’s content should accurately describe the image represented by all sources, though cropping and zooming of sources may differ.">
    <!-- Matches by default: -->
    <source src="mobile.jpg" /> 

    <!-- Overrides the previous source for windows greater than 600px -->
    <source src="medium.jpg" media="min-width: 600px" /> 

    <!-- Overrides the previous source for windows greater than 900px -->
    <source src="fullsize.jpg" media="min-width: 900px" /> 

     <!-- Fallback content, only displayed in the event the <picture> tag is unsupported by the browser: --> 
    <img src="mobile.jpg" />
</picture>

2.2. Functional Polyfill

Scott Jehl has put together a JavaScript polyfill that could be used to bring similar behavior to older browsers should <picture> see widespread adoption. As the polyfill is fully dependent on JavaScript, it differs from the behavior of a native implementation in that fallback content is also displayed in the event that JavaScript is unavailable. This ensures a predictable fallback regardless of the presence of JavaScript in older browsers, though a native implementation would have no such dependency on scripting.

3. Example Use Cases

3.1. Flexible Layouts

Serving full-bleed images within a flexible layout or a layout dictated by media queries requires a source image with the largest necessary inherent size and scaling it down through CSS. On smaller displays such as phones and tablets—where bandwidth can be at a premium—this means an exceptionally wasteful request.

While there are currently “responsive images” solutions that deliver smaller images by default and conditionally load a larger image above a certain window size, all of these involve a redundant request and rely fully on JavaScript. On large displays, an image’s src will be prefetched prior to any logic that swaps the image, in many modern browsers. This is further detailed in this post.

<picture alt="Image of a polar bear blinking during a snowstorm.">
    <!-- Matches by default: -->
    <source src="mobile.jpg" /> 
    <source src="medium.jpg" media="min-width: 600px" />    
    <source src="fullsize.jpg" media="min-width: 900px" />
    <img src="mobile.jpg" />
</picture>

Assuming a 960px wide window at the time the page is requested and the sample markup pattern in section 3.1, the UA should make a single request for “fullsize.jpg.” Any window/screen smaller than 600px is served “mobile.jpg”, which—as a completely alternate source—could be cropped as well as resized in order to preserve the focus of the image at smaller sizes.

3.2. High-Resolution Displays

High resolution screens such as Apple’s Retina display will require high-resolution images, leaving us with a situation similar to the above: either serving larger, high-resolution images to displays that can’t take advantage of them, or forcing high-density displays to first download a low-resolution image then replace it with a high-resolution image. The latter—the approach currently used on Apple’s website—is far from ideal, and the former is so fraught with concerns that it has recently been adressed in such mainstream publications as the New York Times.

<picture alt="Hero image for new high-resolution device, containing a cringe-worthy portmanteau.">
    <!-- Matches by default: -->
    <source src="standard-res.jpg" />   
    <source src="high-res.jpg" media="[-webkit-]min-device-pixel-ratio: 2" />
    <img src="standard-res.jpg" />
</picture>

In this instance, the standard resolution image is served by default and as a fallback in cases where <picture> is unsupported. The high resolution image is served instead only in cases where the pixel-ratio media query matches.

4. Requirements

A conforming user agent must meet the following requirements:

  • The appropriate asset MUST be fetched by way of a single request. A change in window size causing the media attribute to match an alternate source SHOULD trigger a request for said source (to be retrieved from the browser cache, if possible).
  • As with the <video> and <audio> tags, this solution MUST NOT require any client-side scripting, server-side technologies, or headers to reliably deliver content tailored for the end user’s context.
  • Similar to the <video> tag, fallback markup MUST be rendered in any browser that does not recognize the <picture> element. The example in 3.1 uses the “mobile”-sized image as the fallback content, which is the recommended approach: barring the use of a polyfill, the smaller/low-res image should be provided as a fallback to prevent incurring a costly download in contexts that may see no benefit.
  • The specification MUST provide at least the same level of accessibility as <img>, with an alt attribute readily accessible to assistive technology.

5. Prior Discussion

How we arrived at <picture> most recently: https://etherpad.mozilla.org/responsive-assets

Common questions and concerns: http://www.w3.org/community/respimg/common-questions-and-concerns/

Prior discussion on W3 mailing lists: http://www.w3.org/Search/Mail/Public/search?type-index=public-html&index-type=t&keywords=picture+element http://lists.w3.org/Archives/Public/public-html/2007Jun/1057.html http://lists.w3.org/Archives/Public/public-html/2011May/0386.html http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2012-February/ (as “Responsive Images”)

Now that we have the chance, we should consider putting the accessibility fallback in the child content instead, like <object> does.

You really should avoid analogies with <audio> and <video>, because the processing model you want is not the same. <video> picks the first <source> that "works" and never changes again after that. For <picture>, you want to switch between sources adaptively. Not using <source> at all would be preferable, how about this:

<picture>
<img src=small media=foo>
<img src=big media=bar>
</picture>

The polyfill would generate an id for each such img and this style:
<style>picture>img { display: none; }</style>
<style media=foo>img#genid1 { display: block; }</style>
<style media=bar>img#genid2 { display: block; }</style>

Also, has someone done studies of how common the <image> alias for <img> actually is?

Note that webkit have a patch to add responsive image functionality to css background image selector {

background: image-set(url(foo-lowres.png) 1x,
url(foo-highres.png) 2x) center;
}

with a suggestion that we re-use image-set as an attribute on img, too.

See http://lists.w3.org/Archives/Public/www-style/2012Feb/1103.html

Thanks Bruce!

We tossed around a couple of ideas like this as far back as the Etherpad: letting something in CSS serve as a form of “controller” for a content image’s source. After some discussion amongst ourselves and with vendors, none really seemed to have legs.

It’s worth noting that any CSS solution I’ve seen for content images has fallen short where we’d still be unable to avoid the request for an img’s original source in any browser that supports prefetching. As useful as image-set could prove to be, I would guess it leaves the image tag itself in much the same position as a “broken” responsive images script does. It would likely mean doing away with—or bypassing—prefetching on select img tags, and that’s understandably not an option for many vendors.

Hey Foolip,

I can definitely see where you’re at with using <img> in place of <source>, but there are a couple of gotchas there:

  • Browsers that don’t support <picture>, in the above snippet, would end up being served several broken images.
  • <video>’s sources are specced for similar behavior using the media attribute. Thus far, I don’t believe that’s available in any major browser—but based on that intent, we should try to keep our pattern as similar as possible as we’re now solving nearly identical issues. As far as I’ve seen, there is no mention of re-requesting a video’s source when a new media attribute is matched, but it does stand to reason—you may know this better than I do, though. If that’s not the case it’s a larger issue, and we may want to follow <video>’s lead there just the same.
  • There’s been a lot of resistance around adding too much conditional logic to image prefetching, which is understandable. This would leave us in a situation where we’d need to say “fetch any image, so long as that image isn’t a child of a <picture> element.” We’ll likely see implementor resistance there.
  • Regarding a11y—and I may need to scrub my hands for a few hours after typing this—I think it may be best for now to keep things inline with img. Trust me: I know that this seems like the perfect opportunity to tackle some long-standing a11y issues, but anything improved upon in <picture> should absolutely apply elsewhere. I worry that it’s the start of a much larger conversation, that could spiral out of control quickly and lead to losing track of the original problem we were looking to solve—and I’m absolutely including myself, there.

Would you mind shooting me an email at mat@matmarquis.com? I’d love to continue these conversations with you outside of the thread.

First, I'm sceptical about this feature as a whole. I agree that there's a problem, but I'm not convinced this is a good solution to the problem.

Bandwidth is a temporary problem. In 5 or 10 years from now, you might not need to worry about having multiple images, but can just use one high resolution image. Features we add to the platform stick around forever, so we should consider that they also make sense in the long term.

When zooming, this proposal says to download a new image. This means that the page will feel unresponsive even after it has "loaded", because the browser suddenly starts to download lots of images.

Better solutions might be to use gzipped SVG for images that are reasonable for using SVG (e.g. logos), or working on improving JPEG encoders so that a high quality high resolution image takes less bytes (consider if it would be possible to shave off say 50% of your highres images). These solutions make perfect sense on the long term and doesn't even need to wait for browser implementations.

That said, this spec has a number of problems. It talks about video's source element which has completely different processing model, as Philip said. Just drop any mention of video.

The alt attribute should go on the img element -- not the picture element -- to have useful fallback in legacy UAs.

Using the min-width MQ has the wrong behavior. When the user zooms in, you would expect a higher resolution image, but min-width would give you a lower resolution image. We shouldn't let people shoot themselves in the foot like this.

The device-pixel-ratio MQ similarly doesn't work well with zoom.

What you want is the min-resolution MQ. It might make sense to only provide support for this MQ, so that it's more likely that the feature will be used in a way that gives the intended UX.

The section on Requirements says it's about conforming UAs but only the first bullet point is about conforming UAs.

Here's a suggestion:

<picture>
<source src=mobile.jpg resolution=20dpi>
<source src=hires.jpg resolution=200dpi>
<img src=lowres.jpg width=100 height=100 alt="fallback text">
</picture>

Here, the img element takes part of the resource selection and has a fixed resolution of 96dpi. (Note that dpi in MQ is CSS pixels per inch, which is what we want for working as intended with zoom.)

The browser downloads the image that is "good enough" for the current zoom level, and never downloads a lower resolution image than the ones it already has. The img element is the actual replaced element in CSS, so width/height specified on the img gets used even if a different source is downloaded. (I'm not sure what the right behavior should be when width/height aren't specified; maybe that shouldn't be allowed and if you do anyway the img src gets downloaded just to get the intrinsic dimensions.)

Another possible solution:

<img src=image.jpg adaptive alt="fallback text">

image.jpg is a progressive JPEG image. When the adaptive="" attribute is present, the browser makes an HTTP Range request from bytes 0 to infinity. If the server supported Range requests, then, when the browser decodes the image (which it does parallel with downloading), if it finds that it currently has a "good enough" resolution of the image for the current zoom level, it aborts the download. When the user later zooms in, the browser makes another Range request, from the byte where it left off to infinity, and again aborts the download when the resolution is "good enough" (or possibly downloads the whole image).

This solution has nicer markup, but is racy in that image decoding might not happen until "too much" has been downloaded of the image (e.g. the whole image), which would delay the initial page load time. OTOH it doesn't need to download more bytes than the highres image in the worst case, whereas the picture proposal would download more than that (several images) in the worst case. Browsers already have code for handling Range requests for video which might be reusable.

The reason not to always do Range request thing on img is that it might break sites that don't expect Range requests for imgs, or use progressive JPEGs but expect them to be fully downloaded e.g. for use on a canvas onload. Also, an attribute can be feature-checked so you could choose to use a 96dpi image for legacy browsers and a 200dpi image for supporting browsers, which could be spelled as:

 <img src=96dpi.jpg adaptive=200dpi.jpg alt="fallback text">

(Empty string for adaptive would use src instead.)

Another proposal that would help with initial load time is to add an attribute to img (e.g. defer="") that allows the load of the image to be delayed until the image is shown to the user (and the load wouldn't delay window.onload).

Wilto,

The idea is to use image-set as attributes of img in the markup rather than need CSS to require it.However, I'm not convinced that it's an elegant way solution.

Also, r/e your reply to foolip: media queries on video work already in Opera desktop and Safari iOS (last time I checked)

When the user zooms in, you would expect a higher resolution image, but min-width would give you a lower resolution image.

Clarification: note that this applies to zoom in e.g. Opera for desktop (which triggers different media queries as the viewport becomes smaller for zoom levels > 100%), which is different from zoom in mobile browsers (where no media queries are triggered when you zoom in or out using pinch or double-tap zoom).

@zcorpan, you mentioned using <source src=mobile.jpg resolution=20dpi>. ddpx would maybe be better?

I'm a bit worried that markup is quite complex/verbose for just providing a @2x image.

I'm also worried about using media queries for bandwidth/memory constraints. See intro to the etherpad discussion — with media queries UA has no ability to override which image it wants. There are use cases where author should have final say ("artistic"), and use cases where UA should have final say ("technical"), and these shouldn't be conflated (e.g. pixel size of my screen doesn't tell you if I'm on metered connection or if my browser is low on RAM, and I don't expect most authors to adapt to my zoom level in their MQs).

Therefore I suggest adding src2x="high-res.image" to both <img> and <source>, as choice of image DPI (file size and screen pixel density) is orthogonal to choice of image appropriate for the layout (screen size/orientation).

Selection of arbitrary DPI may be an overkill, as we're likely to settle on "double density" screens which continue "tradition" of CSS pixel size being a bit fudged and variable DPI just to fit device DPI better.


Edit: WebKit added url() 2x support to CSS, so I think HTML src2x counterpart makes even more sense in that context.

While the above are clever solutions to the resolution problem for sure, I wonder if they aren’t a bit too specific. This would introduce a redundant means of determining information about a user’s display, and by side-stepping media queries we’ve now introduced a “fork” of sorts. While the markup for <picture> may be a little verbose—albiet in a very familiar way—one of the major considerations was “future friendliness”: as media queries are expanded over time, we’d find ourselves with an increasingly useful element. We could not only use it now to determine image size and density—say the furtive whispers of one day seeing bandwidth as a MQ should come true, we could selectively serve high-resolution images to high-density displays only when on a 4G-or-better connection. We could mix-and-match media queries, whereas this entirely new system would leave us bound to DPI(/DPX) unless it were expanded in parallel. As we already have a flexible, reliable system in place to solve the problem of resolution—and more—on the client side, it stands to reason that we should make use of it.

It also bears repeating that we arrived at this markup pattern not in a brainstorming vacuum, but after months of discussion involving standards bodies and browser vendors. We have been told several times that modifying <img> and bypassing prefetching may not be an option for several vendors. By not bypassing prefetching, of course, we would have gone to all this effort to introduce an element as broken as responsive images scripts are today. Most of this discussion is available throughout the Community Group and the original Etherpad, and has been mentioned enough to warrant a “frequently asked questions” page. I urge you to read through the history and ask questions, and believe me when I say we’re not spitballing, here. This is the result of nearly a year of research and conversation. I absolutely welcome new feedback and ideas—and yours is interesting for certain—but I wouldn’t want to derail discussions for the sake of a subject we’ve already established as a non-starter.

With regards to bandwidth not being an issue in ten years—well, think of the difference in typical internet connections ten years ago versus today. It’s night and day, for certain: from a screeching 28.8kbps to blazing fast internet in the air around us at all times. And yet: here we are concerned about pictures—because our bandwidth need has always expanded to use the bandwidth we have available.

Though, you could make a case that we have plenty of bandwidth to go around, even today—even factoring in the advent of “Retina” images. But that’s here, on 4G connections, using expensive and feature-laden devices. This is to say absolutely nothing of the millions of people worldwide accessing the internet on mobile connections alone, paying for each kilobyte consumed, with devices only slightly better than feature phones. Who are we to say “we would like sharper looking images right now, so we’re having them. Your needs might sort themselves out eventually?” I am not willing to saddle users in underdeveloped countries with additional costs—not bandwidth, but actual economic cost—so that more privileged users can have a slightly better experience. I am not content to put “best viewed eventually” in the footer of the websites I build, and I am loathe to so much as think “best viewed in the first world.”

What we have is an issue of tremendous cost to users, and that is why we do this for a living: to solve problems for our users.

Is <picture> ideal? Absolutely not, no. And it is verbose, yes, but it’s also the result of countless discussions and consensus. There will always be some aspect of any proposal that can be improved upon, and I urge everyone to look at our proposal as a “first step.” Otherwise, it’s easy for us to riff on each others’ ideas in circles forever—we’re developers, and searching for better solutions is what we do. But at a certain point we’re doing so at the cost of failing users today.

I’m happy to answer any questions or point anyone on this thread in the direction of more information, but I’ll spare you all the tirades from here on. I may archive these comments on the CG and remove them here, unless anyone protests.

MQs for layout and cropping are perfect — those are things UAs can't make themselves.

But MQs should not be used for mere performance optimization. It's an abuse of the tool. MQs should be reserved for choices that UA cannot make better, automatically.

UAs are great at calculating trade-offs between easily quantifiable things like network speed, bandwidth cost, memory availability, cache presence and screen density and zoom level. That should be done by UA, and authors should not be burdened with such non-trivial, but boringly uniform and mechanic selections.

Writing MQ that takes into account all the factors I've listed above is hard. I've tried, and it's a total mess — at least 4 long lines of boilerplate per image. I'm pretty sure authors will not bother (or will make mistakes in the long complicated query), and users will end up with suboptimal choices. Even if MQ language was extended to have first-class or, it would still be pretty tangled boilerplate every author has to copy&paste for every image.

MQs are not future-proof, because when new factors appear, you can't expect all websites on the web to be suddenly updated to take that into account. MQ's can't use new factors before UAs expose them, so the ball is always in UAs court.

Giving optimisation power to the UA could make new factors work immediately without waiting for websites to acknowledge it.

Only UA-controlled optimisation is not enough, so that's why I propose two orthogonal mechanisms.

One huge problem with MQs is that UA is not allowed to override them, even when UA could do better. For example if you say "use low-res image on 3G", and UA has cached a high-res image while it was on WiFi, UA will still be forced to needlessly downgrade the image. In such case MQ-based-optimisation would cause worse experience, and wasted bandwidth.

To be clear: I'm not saying <picture> should not use media queries. MQs are useful. I'm saying MQs should not be involved in decisions made purely for performance reasons.

Performance decisions need another declarative mechanism which allows UA to make the final decision, because we can't expose all factors to authors and expect authors to always make good use of all factors available. Author is wrong person to ask whether I prefer high-res or low-res version of the same image (author doesn't know my mobile contract, doesn't know how many tabs I have open, doesn't know whether I care more about detailed picture or quick loading — but I could configure my UA to know that).

I am dead against doing anything along the lines of "2x". The whole 2x thing is a solution for Apple's very very specific problem of a 2x retina display. It is not appropriate elsewhere, and it is far too specific. Use media queries as a trigger for reacting to device capabilities because media queries are more flexible, and are expanding as required to detect other features such as bandwidth - which are every bit as important as actual pixel dimensions, if not more so.

Could you replace the "Scott Jehl" link with one to the actual polyfill? That would be a lot more useful.

Was there prior discussion about making the <img> child element also the one matching by default? In your examples, default <source> and the fallback both have the same src, so reducing that duplication would be nice.

"we could selectively serve high-resolution images to high-density displays only when on a 4G-or-better connection"

Isn't the browser in a better position than the author to determine whether the user would want the high-resolution image immediately? It seems to me that it's better to let the author say "these are the images I have, they have these properties" and let the browser choose the version it wants, than to force authors to write rules choosing between viewport size, device pixel size, zoom level, bandwidth, etc, and expect them to get it right. From the examples I've seen, the actual behavior would be completely backwards in some situations (e.g. zoom in on desktop) or even result in no image at all (e.g. a list of min-width rules and the user has a smaller width than what the author cared to list).

We have been in a situation before where browsers had to lie about their properties to get sites to give them content with better user experience (e.g. mobiles moved from "handheld" media to "screen" media because the user experience was horrible on sites if you said you were a "handheld").

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.