Skip to content

Instantly share code, notes, and snippets.

@wilsonpage
Last active February 20, 2024 05:21
Show Gist options
  • Star 49 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save wilsonpage/a4568d776ee6de188999afe6e2d2ee69 to your computer and use it in GitHub Desktop.
Save wilsonpage/a4568d776ee6de188999afe6e2d2ee69 to your computer and use it in GitHub Desktop.
An implementation of stale-while-revalidate for Cloudflare Workers
export const CACHE_STALE_AT_HEADER = 'x-edge-cache-stale-at';
export const CACHE_STATUS_HEADER = 'x-edge-cache-status';
export const CACHE_CONTROL_HEADER = 'Cache-Control';
export const CLIENT_CACHE_CONTROL_HEADER = 'x-client-cache-control';
export const ORIGIN_CACHE_CONTROL_HEADER = 'x-edge-origin-cache-control';
enum CacheStatus {
HIT = 'HIT',
MISS = 'MISS',
REVALIDATING = 'REVALIDATING',
}
const swr = async ({
request,
event,
}: {
request: Request;
event: FetchEvent;
}) => {
const cache = caches.default;
const cacheKey = toCacheKey(request);
const cachedRes = await cache.match(cacheKey);
if (cachedRes) {
let cacheStatus = cachedRes.headers.get(CACHE_STATUS_HEADER);
if (shouldRevalidate(cachedRes)) {
cacheStatus = CacheStatus.REVALIDATING;
// update cached entry to show it's 'updating'
// and thus shouldn't be re-fetched again
await cache.put(
cacheKey,
addHeaders(cachedRes, {
[CACHE_STATUS_HEADER]: CacheStatus.REVALIDATING,
})
);
event.waitUntil(
fetchAndCache({
cacheKey,
request,
event,
})
);
}
return addHeaders(cachedRes, {
[CACHE_STATUS_HEADER]: cacheStatus,
[CACHE_CONTROL_HEADER]: cachedRes.headers.get(
CLIENT_CACHE_CONTROL_HEADER
),
});
}
return fetchAndCache({
cacheKey,
request,
event,
});
};
const fetchAndCache = async ({
cacheKey,
request,
event,
}: {
request: Request;
event: FetchEvent;
cacheKey: Request;
}) => {
const cache = caches.default;
// we add a cache busting query param here to ensure that
// we hit the origin and no other upstream cf caches
const originRes = await fetch(addCacheBustParam(request));
const cacheControl = resolveCacheControlHeaders(request, originRes);
const headers = {
[ORIGIN_CACHE_CONTROL_HEADER]: originRes.headers.get('cache-control'),
[CACHE_STALE_AT_HEADER]: cacheControl?.edge?.staleAt?.toString(),
'x-origin-cf-cache-status': originRes.headers.get('cf-cache-status'),
};
if (cacheControl?.edge) {
// store the cache response w/o blocking response
event.waitUntil(
cache.put(
cacheKey,
addHeaders(originRes, {
...headers,
[CACHE_STATUS_HEADER]: CacheStatus.HIT,
[CACHE_CONTROL_HEADER]: cacheControl.edge.value,
// Store the client cache-control header separately as the main
// cache-control header is being used as an api for cf worker cache api.
// When the request is pulled from the cache we switch this client
// cache-control value in place.
[CLIENT_CACHE_CONTROL_HEADER]: cacheControl?.client,
// remove headers we don't want to be cached
'set-cookie': null,
'cf-cache-status': null,
vary: null,
})
)
);
}
return addHeaders(originRes, {
...headers,
[CACHE_STATUS_HEADER]: CacheStatus.MISS,
[CACHE_CONTROL_HEADER]: cacheControl?.client,
// 'x-cache-api-cache-control': cacheControl?.edge?.value,
// 'x-origin-res-header': JSON.stringify(toObject(originRes.headers)),
});
};
const resolveCacheControlHeaders = (req: Request, res: Response) => {
// don't cache error or POST/PUT/DELETE
const shouldCache = res.ok && req.method === 'GET';
if (!shouldCache) {
return {
client: 'public, max-age=0, must-revalidate',
};
}
const cacheControl = res.headers.get(CACHE_CONTROL_HEADER);
// never cache anything that doesn't have a cache-control header
if (!cacheControl) return;
const parsedCacheControl = parseCacheControl(cacheControl);
return {
edge: resolveEdgeCacheControl(parsedCacheControl),
client: resolveClientCacheControl(parsedCacheControl),
};
};
const resolveEdgeCacheControl = ({
sMaxage,
staleWhileRevalidate,
}: ParsedCacheControl) => {
// never edge-cache anything that doesn't have an s-maxage
if (!sMaxage) return;
const staleAt = Date.now() + sMaxage * 1000;
// cache forever when no swr window defined meaning the stale
// content can be served indefinitely while fresh stuff is re-fetched
if (staleWhileRevalidate === 0) {
return {
value: 'immutable',
staleAt,
};
}
// when no swr defined only cache for the s-maxage
if (!staleWhileRevalidate) {
return {
value: `max-age=${sMaxage}`,
staleAt,
};
}
// when both are defined we extend the cache time by the swr window
// so that we can respond with the 'stale' content whilst fetching the fresh
return {
value: `max-age=${sMaxage + staleWhileRevalidate}`,
staleAt,
};
};
const resolveClientCacheControl = ({ maxAge }: ParsedCacheControl) => {
if (!maxAge) return 'public, max-age=0, must-revalidate';
return `max-age=${maxAge}`;
};
interface ParsedCacheControl {
maxAge?: number;
sMaxage?: number;
staleWhileRevalidate?: number;
}
const parseCacheControl = (value = ''): ParsedCacheControl => {
const parts = value.replace(/ +/g, '').split(',');
return parts.reduce((result, part) => {
const [key, value] = part.split('=');
result[toCamelCase(key)] = Number(value) || 0;
return result;
}, {} as Record<string, number | undefined>);
};
const addHeaders = (
response: Response,
headers: { [key: string]: string | undefined | null }
) => {
const response2 = new Response(response.clone().body, {
status: response.status,
headers: response.headers,
});
for (const key in headers) {
const value = headers[key];
// only truthy
if (value !== undefined) {
if (value === null) response2.headers.delete(key);
else {
response2.headers.delete(key);
response2.headers.append(key, value);
}
}
}
return response2;
};
const toCamelCase = (string: string) =>
string.replace(/-./g, (x) => x[1].toUpperCase());
/**
* Create a normalized cache-key from the inbound request.
*
* Cloudflare is fussy. If we pass the original request it
* won't find cache matches perhaps due to subtle differences
* in headers, or the presence of some blacklisted headers
* (eg. Authorization or Cookie).
*
* This method strips down the cache key to only contain:
* - url
* - method
*
* We currently don't cache POST/PUT/DELETE requests, but if we
* wanted to in the future the cache key could contain req.body,
* but this is probably not ever a good idea.
*/
const toCacheKey = (req: Request) =>
new Request(req.url, {
method: req.method,
});
const shouldRevalidate = (res: Response) => {
// if the cache is already revalidating then we shouldn't trigger another
const cacheStatus = res.headers.get(CACHE_STATUS_HEADER);
if (cacheStatus === CacheStatus.REVALIDATING) return false;
const staleAtHeader = res.headers.get(CACHE_STALE_AT_HEADER);
// if we can't resolve an x-cached-at header => revalidate
if (!staleAtHeader) return true;
const staleAt = Number(staleAtHeader);
const isStale = Date.now() > staleAt;
// if the cached response is stale => revalidate
return isStale;
};
const addCacheBustParam = (request: Request) => {
const url = new URL(request.url);
url.searchParams.append('t', Date.now().toString());
return new Request(url.toString(), request);
};
export default swr;
import { Request, Response } from 'node-fetch';
import mockDate from 'mockdate';
import swr, { CACHE_CONTROL_HEADER } from './swr';
const STATIC_DATE = new Date('2000-01-01');
const fetchMock = jest.fn();
const cachesMock = {
match: jest.fn(),
put: jest.fn(),
};
// @ts-ignore
global.caches = { default: cachesMock };
global.fetch = fetchMock;
beforeEach(() => {
mockDate.set(STATIC_DATE);
cachesMock.match.mockReset();
cachesMock.put.mockReset();
global.fetch = fetchMock.mockReset();
});
describe('swr', () => {
describe('s-maxage=60, stale-while-revalidate', () => {
describe('first request', () => {
let cachesPutCall: [Request, Response];
let response;
let request;
beforeEach(async () => {
request = new Request('https://example.com');
fetchMock.mockResolvedValue(
new Response('', {
headers: {
'cache-control': 's-maxage=60, stale-while-revalidate',
},
})
);
response = await swr({
request,
event: mockFetchEvent(),
});
cachesPutCall = cachesMock.put.mock.calls[0];
});
it('calls fetch as expected', () => {
const [request] = fetchMock.mock.calls[0];
expect(fetchMock).toHaveBeenCalledTimes(1);
expect(request.url).toBe(
`https://example.com/?t=${STATIC_DATE.getTime()}`
);
});
it('has the expected cache-control header', () => {
expect(response.headers.get(CACHE_CONTROL_HEADER)).toBe(
'public, max-age=0, must-revalidate'
);
});
it('caches the response', () => {
const [request, response] = cachesMock.put.mock.calls[0];
expect(cachesMock.put).toHaveBeenCalledTimes(1);
expect(request.url).toBe('https://example.com/');
// caches forever until revalidate
expect(response.headers.get('cache-control')).toBe('immutable');
});
describe('… then second request (immediate)', () => {
let response: Response;
beforeEach(async () => {
request = new Request('https://example.com');
cachesMock.match.mockResolvedValueOnce(cachesPutCall[1]);
response = ((await swr({
request,
event: mockFetchEvent(),
})) as unknown) as Response;
});
it('returns the expected cached response', () => {
expect(response.headers.get('x-edge-cache-status')).toBe('HIT');
});
it('has the expected cache-control header', () => {
expect(response.headers.get('cache-control')).toBe(
'public, max-age=0, must-revalidate'
);
});
});
describe('… then second request (+7 days)', () => {
let response: Response;
beforeEach(async () => {
request = new Request('https://example.com');
// mock clock forward 7 days
mockDate.set(
new Date(STATIC_DATE.getTime() + 1000 * 60 * 60 * 24 * 7)
);
cachesMock.match.mockResolvedValueOnce(cachesPutCall[1]);
response = ((await swr({
request,
event: mockFetchEvent(),
})) as unknown) as Response;
});
it('returns the expected cached response', () => {
expect(response.headers.get('x-edge-cache-status')).toBe(
'REVALIDATING'
);
});
});
});
});
describe('max-age=10, s-maxage=60, stale-while-revalidate=60', () => {
let cachesPutCall: [Request, Response];
let request;
let response;
beforeEach(async () => {
request = new Request('https://example.com');
fetchMock.mockResolvedValueOnce(
new Response('', {
headers: {
'cache-control':
'max-age=10, s-maxage=60, stale-while-revalidate=60',
},
})
);
response = await swr({
request,
event: mockFetchEvent(),
});
cachesPutCall = cachesMock.put.mock.calls[0];
});
it('calls fetch as expected', () => {
const [request] = fetchMock.mock.calls[0];
expect(fetchMock).toHaveBeenCalledTimes(1);
expect(request.url).toBe(
`https://example.com/?t=${STATIC_DATE.getTime()}`
);
});
it('has the expected response', () => {
expect(response.headers.get('cache-control')).toBe('max-age=10');
});
it('caches the response', () => {
const [request, response] = cachesPutCall;
expect(cachesMock.put).toHaveBeenCalledTimes(1);
expect(request.url).toBe('https://example.com/');
// stores the cached response for the additional swr window
expect(response.headers.get('cache-control')).toBe('max-age=120');
});
describe('… then second request', () => {
let res: Response;
beforeEach(async () => {
request = new Request('https://example.com');
cachesMock.match.mockResolvedValueOnce(cachesPutCall[1]);
res = ((await swr({
request,
event: mockFetchEvent(),
})) as unknown) as Response;
});
it('returns the expected cached response', () => {
expect(res.headers.get('x-edge-cache-status')).toBe('HIT');
});
it('has the expected cache-control header', () => {
expect(res.headers.get('Cache-Control')).toBe('max-age=10');
});
describe('… then + 61s', () => {
let response: Response;
let revalidateFetchDeferred;
beforeEach(async () => {
// move clock forward 61 seconds
mockDate.set(new Date(STATIC_DATE.getTime() + 61 * 1000));
request = new Request('https://example.com');
// reset mock state
fetchMock.mockReset();
cachesMock.put.mockReset();
cachesMock.match.mockResolvedValueOnce(cachesPutCall[1]);
revalidateFetchDeferred = deferred();
// mock the revalidated request
fetchMock.mockImplementationOnce(() => {
revalidateFetchDeferred.resolve(
new Response('', {
headers: {
'cache-control': 's-maxage=60, stale-while-revalidate=60',
},
})
);
return revalidateFetchDeferred.promise;
});
response = ((await swr({
request,
event: mockFetchEvent(),
})) as unknown) as Response;
});
it('returns the STALE response', () => {
expect(response.headers.get('x-edge-cache-status')).toBe(
'REVALIDATING'
);
});
it('updates the cache entry state to REVALIDATING', () => {
const [, response] = cachesMock.put.mock.calls[0];
expect(response.headers.get('x-edge-cache-status')).toBe(
'REVALIDATING'
);
});
it('fetches to revalidate', () => {
expect(fetchMock).toHaveBeenCalled();
});
it('updates the cache with the fresh response', async () => {
await revalidateFetchDeferred.promise;
const [, response] = cachesMock.put.mock.calls[1];
expect(response.headers.get('x-edge-cache-status')).toBe('HIT');
});
});
});
});
describe('max-age=60', () => {
let request;
beforeEach(async () => {
request = new Request('https://example.com');
fetchMock.mockResolvedValueOnce(
new Response('', {
headers: {
'cache-control': 'max-age=60',
},
})
);
await swr({
request,
event: mockFetchEvent(),
});
});
it('calls fetch as expected', () => {
const [request] = fetchMock.mock.calls[0];
expect(fetchMock).toHaveBeenCalledTimes(1);
expect(request.url).toBe(
`https://example.com/?t=${STATIC_DATE.getTime()}`
);
});
it('does NOT cache the response', () => {
expect(cachesMock.put).not.toHaveBeenCalled();
});
});
describe('s-maxage=60', () => {
let response;
let request;
beforeEach(async () => {
request = new Request('https://example.com');
fetchMock.mockResolvedValueOnce(
new Response('', {
headers: {
'cache-control': 's-maxage=60',
},
})
);
response = await swr({
request,
event: mockFetchEvent(),
});
});
it('caches the response', () => {
expect(cachesMock.put).toHaveBeenCalled();
});
it('has the expected response', () => {
expect(response.headers.get(CACHE_CONTROL_HEADER)).toBe(
'public, max-age=0, must-revalidate'
);
});
});
describe('no-store, no-cache, max-age=0', () => {
let response;
let request;
beforeEach(async () => {
request = new Request('https://example.com');
fetchMock.mockResolvedValueOnce(
new Response('', {
headers: {
'cache-control': 'no-store, no-cache, max-age=0',
},
})
);
response = await swr({
request,
event: mockFetchEvent(),
});
});
it('does NOT cache the response', () => {
expect(cachesMock.put).not.toHaveBeenCalled();
});
it('has the expected response', () => {
expect(response.headers.get(CACHE_CONTROL_HEADER)).toBe(
'public, max-age=0, must-revalidate'
);
});
});
describe('404', () => {
let response;
let request;
beforeEach(async () => {
request = new Request('https://example.com');
fetchMock.mockResolvedValueOnce(
new Response('error', {
status: 404,
headers: {
'cache-control': 's-maxage=100',
},
})
);
response = await swr({
request,
event: mockFetchEvent(),
});
});
it('does NOT cache the response', () => {
expect(cachesMock.put).not.toHaveBeenCalled();
});
});
describe('POST', () => {
let response;
let request;
beforeEach(async () => {
request = new Request('https://example.com', {
method: 'POST',
});
fetchMock.mockResolvedValueOnce(
new Response('error', {
headers: {
'cache-control': 's-maxage=100',
},
})
);
response = await swr({
request,
event: mockFetchEvent(),
});
});
it('does NOT cache the response', () => {
expect(cachesMock.put).not.toHaveBeenCalled();
});
});
describe('max-age=60', () => {
let request;
beforeEach(async () => {
request = new Request('https://example.com');
fetchMock.mockResolvedValueOnce(
new Response('', {
headers: {
'cache-control': 'max-age=60',
},
})
);
await swr({
request,
event: mockFetchEvent(),
});
});
it('calls fetch as expected', () => {
const [request] = fetchMock.mock.calls[0];
expect(fetchMock).toHaveBeenCalledTimes(1);
expect(request.url).toBe(`https://example.com/?t=${Date.now()}`);
});
it('does NOT cache the response', () => {
expect(cachesMock.put).not.toHaveBeenCalled();
});
});
});
describe('s-maxage=1800, stale-while-revalidate=86400', () => {
let request;
beforeEach(async () => {
request = new Request('https://example.com');
fetchMock.mockResolvedValueOnce(
new Response('', {
headers: {
'cache-control': 's-maxage=1800, stale-while-revalidate=86400',
},
})
);
await swr({
request,
event: mockFetchEvent(),
});
});
// it('calls fetch as expected', () => {
// const [request] = fetchMock.mock.calls[0];
// expect(fetchMock).toHaveBeenCalledTimes(1);
// expect(request.url).toBe(`https://example.com/?t=${Date.now()}`);
// });
it('does cache the response', () => {
expect(cachesMock.put).toHaveBeenCalled();
});
});
const mockFetchEvent = () =>
(({
waitUntil: jest.fn(),
} as unknown) as FetchEvent);
const deferred = () => {
let resolve;
let reject;
const promise = new Promise((_resolve, _reject) => {
resolve = _resolve;
reject = _reject;
});
return {
promise,
resolve,
reject,
};
};
@webseo2006
Copy link

Is there a vanilla JavaScript version I can test directly in worker? Thanks

@wilsonpage
Copy link
Author

wilsonpage commented Mar 20, 2021 via email

@jamesvidler
Copy link

Has anyone had success using this in cloudflare?

@wilsonpage
Copy link
Author

wilsonpage commented Jul 20, 2021 via email

@Schachte
Copy link

Schachte commented Oct 5, 2022

@wilsonpage Looking at https://gist.github.com/wilsonpage/a4568d776ee6de188999afe6e2d2ee69#file-swr-ts-L32

Wouldn't you run into race conditions due to the eventually consistent nature of KV put operations?

IE:
If client A invokes a request at timestamp 1 and we update KV, then client B can come in several seconds later, do the same thing and see non-deterministic behavior given that each client can hit separate data centers and cross-replication may take ~60 seconds.

@wilsonpage
Copy link
Author

@wilsonpage Looking at https://gist.github.com/wilsonpage/a4568d776ee6de188999afe6e2d2ee69#file-swr-ts-L32

Wouldn't you run into race conditions due to the eventually consistent nature of KV put operations?

IE: If client A invokes a request at timestamp 1 and we update KV, then client B can come in several seconds later, do the same thing and see non-deterministic behavior given that each client can hit separate data centers and cross-replication may take ~60 seconds.

@Schachte yeah sounds like it could be possible, but what's the worst case scenario? The CF Worker just fetches from origin twice? If so once the last request resolves the eventually cache entry will be the same so doesn't sound like the end of the world. I don't see any obvious solution to this as we don't have a sync datastore.

@Schachte
Copy link

Schachte commented Oct 6, 2022

@wilsonpage No, the worst case scenario would be two async requests happening within a similar timestep could clobber each other with stale data overwriting new data, leading to extremely non-deterministic behavior (depending on the application of course). And I agree, there is not an obvious workers only solution to this. Consensus ruins everything 😢

Drew up this pic below. Imagine SWR flow in data center (left) and SWR flow data center (right). Requests happening in parallel, eventual consistency leading to chaos. Who wins the race? new data or newer data? You'd never know because it's dependent on variables like geographical origin of the request, etc.

image

@wilsonpage
Copy link
Author

@Schachte I see what you mean but (at least for my use-case), the difference between new data and newer data will be negligible. After all we're talking about the difference of origin data in the timeframe of (at-most) 1 sec. IIUC possible outcomes:

  • A. Req 1 gets old data and triggers a cache refresh.
  • B. Req 2 gets same old data and triggers a cache refresh.
  • C. Req 2 gets the fresh data triggered by Req 1's origin fetch.

The only potential race I see here is that both Req 1 and Req 2 end up triggering a revalidation so then there's a race to see which one ends up populating the KV cache. For this to happen both requests would have to happen within at most a couple of seconds. Req 2 is most likely to end up being the one to set the cache last, but obvious network uncertainty could mean it ends up resolving before Req 1. The question is: is the difference of max 2 seconds in origin data a problem for your use-case?

If the answer is yes, then I'm not sure you should be using a SWR approach in the first place. It's best suited for data that doesn't change very often and/or displaying 'stale' data is acceptable UX for your app.

@Schachte
Copy link

Schachte commented Oct 7, 2022

@wilsonpage I think the difference would be coupled to whatever your TTL is. When the TTL is set on the object in KV (default 60 seconds), the replication is pull-based, so I don't think the delay is at-most 1 second, I think it's at-least 60.

When you write to KV, you'll have almost instantaneous updates within that colo that write originated to. However, none of the other colos will attempt to care about that data until their TTL expires in which only then will they reach out to the originating colo to pull updates into their respective KV.

btw this isn't anything against your design, it's actually really cool. Just a shortcoming for some potential use-cases. Also, you're absolutely correct in that this would be a horrible use-case for people with high write throughput.

@josephglanville
Copy link

If you want consistency you can use Durable Objects to serialize access to the origin. However for performance reasons you would probably need to shard access over a number of Durable Objects for this to work, i.e by utilizing consistent hashing or similar.

For my needs I'm thinking about using a similar approach as above or simply advocating for a different CDN given stale-while-revalidate really should be supported by CF natively without the need to hackily implement it in Workers.

@Schachte
Copy link

@josephglanville - DO centralizes the data to a single colo, so definitely could have performance concerns, but would solve the consistency problem. I'm working on a design to solve this issue, I'll post a repo here and open-source it if it looks good.

I think maybe a mix of using KV + a blob store like GCS/S3 in conjunction with Cloudflare edge cache could help mimick SWR fairly nicely. They can race each other during the fetch. If the blob store loses, it'll asynchronously fetch and populate the edge cache, etc.

@josephglanville
Copy link

@wilsonpage I'm interested in using some of your implementation in our codebase, would it be possible for you to confirm a license the code could be used under? i.e (MIT, BSD, Apache, etc)

@wilsonpage
Copy link
Author

@wilsonpage I'm interested in using some of your implementation in our codebase, would it be possible for you to confirm a license the code could be used under? i.e (MIT, BSD, Apache, etc)

Haha! Yeah feel free to use whatever :) Shall we say it's 'MIT'? 🤷

@josephglanville
Copy link

MIT is great. Will add attribution for anything I use, thanks!

@neenhouse
Copy link

Haha! Yeah feel free to use whatever :) Shall we say it's 'MIT'? 🤷

👏 👏

@ysm-dev
Copy link

ysm-dev commented Apr 28, 2023

This works great with worker + hono + trpc!

Thanks 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment