Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
all about ETags

ETags: a pretty sweet feature of HTTP 1.1

HTTP caching review

HTTP provides two ways for servers to control client-side caching of page components:

  • freshness may be based on a date or a token whose meaning is app-specific
  • whether or not the client needs to confirm the cached version is up-to-date with the server

This breaks down as follows:

  • Cache locally and don't check before using.
    • This avoids a network request completely.
    • Expires header - asks the browser to use the local copy until some date
    • Cache-Control:max-age header - asks the browser to use the local copy until a number of seconds after download
  • Cache locally, but check before using.
    • This requires a network request to contact the server, but avoids bandwidth costs of re-downloading.
    • Last-Modified header - time-based validation - asks the browser to confirm the server hasn't updated the component since the Last-Modified date
    • ETag header - token-based validation - asks the browser to confirm the component on the server has the same ETag as the cached copy.

Caching static assets is easy, because static URLs can often change transparently

The golden rule of front-end performance is to minimize HTTP requests, so we use far-future Expires or Cache-Control:max-age headers aggressively. If we are willing to break URLs, we can set the cache far in the future; this is exactly what we did with the connect-cachify tool we saw last week:

  1. when a static asset is downloaded, it's given a far-future caching header (Expires or Cache-control:max-age)
  2. if the asset changes, its URL is changed, breaking the cache & causing the browser to re-download the new file

Caching dynamic content is harder, because the rate of change is unpredictable and URLs must be stable

For dynamic content with persistent URLs, like templated web pages, we can't use the same approach:

  • we can't set far-future caching headers, unless we know exactly when the page will be updated in the future
  • we can't break the URL without hurting page discoverability, breaking bookmarks, and generally breaking the web

Although the browser has to check with the server that its cached copy is fresh, we can at least save the bandwidth and time required to redownload the file.

Because dynamic content might change at any time, we can't use the Last-Modified header; we have to use ETags to make dynamic content cacheable at all.

ETags allow dynamic content to be cached using an app-specific "opaque token"

An ETag, or entity tag, is an opaque token that identifies a version of the component served by a particular URL. The token can be anything enclosed in quotes; often it's an md5 hash of the content, or the content's VCS version number. If you're dealing with internationalized templates, the ETag should be different for each localized version. In general, ETag implementations should respect variations in content usually specified with Vary headers:

  • Vary:Accept-Language is used to signal to browsers that different representations exist, and should be cached separately, depending on the value of the Accept-Language request header.
  • Vary:Cookie is used to signal that the same page, though it might be seen by anonymous and logged-in users, should be cached separately--otherwise, logged-in users would see the anonymous version until they force-refreshed their browsers.

ETags and If-None-Match in action

I moved the ETags/If-None-Match "dialogue" to a separate gist

You can use either ETag or Last-Modified headers, or both, or neither; the HTTP 1.1 RFC actually recommends using both, in which case the server would only return a 304 if both the If-None-Match token and the If-Modified-Since date were fresh.

ETags require some configuration to be helpful; otherwise, they can cause caching problems.

The original YSlow rules, and the book High Performance Web Sites, suggest disabling ETags unless you take the time to properly configure them. This is because Apache and IIS both have terrible default values for ETags, using server-specific node info or server-specific timestamps, so that the ETag set on a component is different for each node in a server farm. Since ETags provide comparatively little performance benefit in general (conditional GETs still require an HTTP request), it's often an improvement just to disable them.

ETags have other, really cool applications

ETags are more than just a caching header; they identify a version of a representation served at a URL. This leads to some cool applications we'll just mention in passing:

  • optimistic concurrency: if 2 authors try to update a shared document, or 2 nodes try to update the same RESTful endpoint, they can avoid clobbering others' edits by passing the last-seen ETag as an If-Match header in a conditional PUT or conditional DELETE. If the version on the server differs from the version the client has edited, then the client's edits shouldn't be allowed.
  • sub-second updates: if a firehose API endpoint or auction webpage changes multiple times per second, and gets lots of traffic, ETags save clients and server a ton of bandwidth, and allow clients to sync with the server continuously. Without ETags, the server would have to use no caching, and let clients redownload stale content, or use HTTP date-based caching, forcing clients to a latency of at least 1 full second between updates.
  • 304s in xhr responses: returning 304s from CRUD-style model endpoints can make short-polling real-time apps more efficient: although the app needs to be built to handle empty 304 responses, doing so can avoid the network and CPU cost (and potential UI lag) of re-downloading, re-JSONifying, and re-processing stale server input.
  • weak ETags can be used to issue partial Range requests of specific byte ranges. This is weird, wild stuff; if you need 206 Partial requests or the like, have fun digging into the RFC :-)
@naholyr

This comment has been minimized.

Copy link

commented Feb 20, 2013

Because dynamic content might change at any time, we can't use the Last-Modified header

Of course we can ;)

The only difficulty is being able to calculate the proper value, which is – very simply – the maximum last-modified of all dynamic elements in the page. Typically in a blog it will be

  • the modification date of the article
  • the modification date of all assets with "breakable" URLs in the page (if an asset is updated, URL changes, and therefore contents change)
  • the modification date of widgets?

All this can often be easily calculated, and always easily cached if necessary. The important thing being Etag is calculated from the content, which means your server has to generate this content, which can be quite costly. While Last-Modified can often be calculated very more easily than the actual content (assuming you cached assets last-modified a single query will generally do the trick), and you can then return early (before generating content) and save server resources.

The advice should then be:

  • If you're able to efficiently and quickly calculate Last-Modified header, you should primarily use it
  • In every case, generating a hash of the content is always simple and fast, so you should use ETag too
@6a68

This comment has been minimized.

Copy link
Owner Author

commented Feb 21, 2013

@naholyr good catch! I should've said, "we can't use the Cache-Control:max-age or Expires headers."

@rmongia

This comment has been minimized.

Copy link

commented Jun 18, 2014

In every case, generating a hash of the content is always simple and fast, so you should use ETag too

What if you don't have the revision history for the build or a database where you can get the version number from? It is not always straightforward to generate the ETag value.

@cmawhorter

This comment has been minimized.

Copy link

commented Sep 17, 2014

@rmondia The strategy you use to generate the etag should be based on the content. However, I find that last modified date is a Good Enough solution.

Additionally, you don't need to hash; just return a date string if you want. It doesn't matter what the etag is as long as it matches. (I'd probably stick to [\w\d\.] myself though)

Landed here randomly...

@Armalon

This comment has been minimized.

Copy link

commented Aug 16, 2016

@naholyr good catch! I should've said, "we can't use the Cache-Control:max-age or Expires headers."

Yep, I stopped reading the article and I started to think: Why is that?
Why don't you just fix this sentence in the article.
Though it's written very well, I really enjoyed reading it, thank you!

@nitronick600

This comment has been minimized.

Copy link

commented Oct 4, 2017

Is there any guidance on when to actually generate an etag? In my experience, if a server supports entity tagging, the eTag is always provided regardless of Cache-Control. Thoughts?

@sp00m

This comment has been minimized.

Copy link

commented Apr 9, 2018

You can use either ETag or Last-Modified headers, or both, or neither; the HTTP 1.1 RFC actually recommends using both.

This is not what RFC 7232 recommends (sections 3.3 and 3.4):

A recipient MUST ignore If-Modified-Since/If-Unmodified-Since if the request contains an If-None-Match/If-Match header field; the condition in If-None-Match/If-Match is considered to be a more accurate replacement for the condition in If-Modified-Since/If-Unmodified-Since, and the two are only combined for the sake of interoperating with older intermediaries that might not implement If-None-Match/If-Match.

Who's right here?

@kirtiso

This comment has been minimized.

Copy link

commented Jun 22, 2018

Hello,
I want to know what Etag header is a vulnerability in a web??if it is then how can exploit it??

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.