To understand HTTP's Vary behaviour we first must understand how the CDN works in the traditional sense of caching and looking up resources.
Note: at the bottom of this page is a sequence diagram which visually illustrates how Vary works (if you prefer a more visual aid).
The CDN caches requests based on a given pair of Host and Path values. So when the CDN accepts a request, as an example, for the resource https://www.buzzfeed.com/videos
the CDN will take the Host (e.g. www.buzzfeed.com
) and the Path (e.g. /videos
) and generate a hash of those values which will become the 'cache key'. Later when the CDN receives the same request from a different client it'll generate the same hash and lookup the resource in its cache using the hash as the key.
Depending on the client, the origin server that generates response content (which will be cached in the CDN) may well want to serve different content for different clients (e.g. serve a German language version of the page for German users vs an English language version for users from the UK or US).
The problem with serving different content for the same resource (e.g. imagine www.example.com/foo
is able to return German or English depending on the client) is that whoever makes the request first will have that version of the content cached by the CDN.
For example if www.example.com/foo
is able to return German or English based on my web browser's built-in value for the HTTP request header Accept-Language
(which for me would be set to en-uk
), then imagine someone from Germany making a request for that uncached resource first (whose browser users the value de-de
).
What will happen in that instance is that the German language version of the response content will be cached using hash(Host + Path) = cache_key
and so when I make my request for the same resource, I'll end up getting the cached German version of the page (which I won't be able to read).
This is where the HTTP Vary response header comes in to help solve the problem. The origin server generating the content should be setup to send a Vary header with the value Accept-Language
. The CDN will look at the Vary header's value and will subsequently cache the content using the same hash(Host + Path) = cache_key
approach, but it'll add another 'key' (known as the 'vary key') to the object which will indicate the value that the browser provided.
So if we go back to the example of a German user making a request for the same resource as myself (and they make the request first), then the vary key assigned to the cached German version of the page will be de-de
, and so although when I make my request the cache lookup key will be the same, I'll get a cache MISS (and the CDN will make a fresh request to the origin) because the value my browser is providing for Accept-Language
is en-uk
and that doesn't match the cached object's vary key value of de-de
.
Now when the origin sends back the English version of the page, the CDN will cache the content again using hash(Host + Path) = cache_key
as the cache lookup key, but will set a different 'vary key' value of en-en
. So we now have two objects in the cache for the same resource: one German and one English.
Note: read this article to understand how to make the most of the Vary response header.