Skip to content

Instantly share code, notes, and snippets.

@bkardell
Created December 5, 2013 14:48
Show Gist options
  • Save bkardell/7806243 to your computer and use it in GitHub Desktop.
Save bkardell/7806243 to your computer and use it in GitHub Desktop.
An optimization debate

Topic:

CDNs and other optimization techniques. This comes up a lot, it crosses numerous mailing lists and twitter. If you have thoughts on this, let's discuss and we can easily cite/refernce in the future....

Here is a statement from @scott_gonzales on twitter, and some thoughts from me to open the discussion

Scott: CDNs have much higher cache-miss rates than you'd think, and JS should be concatenated and deployed from the server

Me: It's true cache-misses are higher, but I don't want to throw the baby out with the bathwater. The advantages of concat will largely disappear with HTTP2. CDNs have a number of things (some in theory and some in practice) which seem good. At an incredibly utilitarian level, if I can offload that from my own infrastructure and maybe reduce hops for this requests too - seems good. At a more conceptual level, the idea that some resources are highly shareble and deserve a special home/cache seems good even if CDNs don't currently fully enable that - seems maybe not so much a problem with the CDN as much as one for the platform to help tackle. It does seem ridiculous to send signficant capabilities like jQuery, Ember or Angular down over and over and ask them to eat up cache space in my own domain. It really seems like there should be 1 and only 1 version of content called jQuery-x.y.z.js

@yoavweiss
Copy link

Well, I'm guessing that the subject is not hosting your own JS on a CDN, but hosting generic frameworks over their "official" CDN, while downloading your own JS from your server.
I'm also guessing that the cache-misses your discussing are cache-misses in the browser's cache (since he never saw this specific jquery version served from this specific CDN), not cache-misses in the CDN infrastructure itself.

If I guessed right, then my short answer is "it depends on the cache HITs".
The long one:
Assuming HTTP2/SPDY, ideally you'd want all of your JS served from the same host as separate files. That would give you the maximal cache & execution flexibility, while providing all the benefits of concatenation (and more).
The downside is that you're missing out on possible users that already downloaded jQuery-x.y.z.js from the "official" CDN, and would need to re-download it.
Delegating jQuery to the CDN, which is a separate host, would force the browser (in case of a MISS) to do the DNS+TCP_SYN+TLS_handshake dance (even if with a possibly shorter RTT), just to download a single resource. (which probably needs to run before your other JS resources would, so it's blocking JS execution)

So, it really depends on the cache HITs ratio your users' browser cache is showing, but if HITs are the minority, you'd get better performance storing everything on your server.

Assuming HTTP1.1, I wouldn't recommend concatenating jQuery along with the rest of your JS (unless you have very little JS), because you don't want to stall jQuery's execution waiting for all the other JS resources to download, and JS files can run only after the entire file was downloaded.
So, in that case, the only "extra" cost of a jQuery cache MISS, is an extra DNS request, and it can be negated by the shorter RTT. That means that the MISS tolerance should be higher than the other case.

A definite answer of "what should be the MISS rate for this to be worth-while" requires some testing on your specific app, to know what the tradeoffs are with the resources it needs.

Regarding non-network related advantages of sharing resources, at least in theory, browsers could keep bytecode/machinecode products of highly shared resources and re-use them. I'm not sure this is actually happening in browsers today, but if it does, that's a bonus point for CDNs.

I completely agree with you that re-downloading the same major frameworks over and over is a waste of everyone's time and computing resources. I'm not sure that CDNs are the solution for that.

@bkardell
Copy link
Author

bkardell commented Dec 5, 2013

I agree with pretty much everything you said - I'm not sure what we call CDN today is the answer either, and I think maybe the Platform is missing something here that basically allows us to differentiate that "highly sharable" code so that it can be treated specially. Even in a limited fashion that is not far from the CDN model, it seems like there is a lot of potential for improvement.... We haven't worked on it in a while, but this was the rationale for the basics of http://bkardell.github.io/tap/ right?

@yoavweiss
Copy link

Yeap

@scottgonzalez
Copy link

@yoavweiss hit a lot of network technical reasons that 3rd party CDNs aren't always great. Unfortunately, the cache HIT rate is much lower than you'd expect, even for something as popular as jQuery. This is a mixture of so many versions being popular, so many different CDNs being popular, and of course the user's actual cache.

However, another issue that often comes us is that you may end up pulling down much more code than you actually need. For example, CDNs have become so popular that it's basically the go-to choice for many devs. Which means that when someone wants just a datepicker or just tabs or just a dialog from jQuery UI, they grab it from a CDN and now they've got dozens of plugins they're not using, which is killing them on a cache MISS (and even on a cache HIT with all the extra parsing).

@bkardell
Copy link
Author

bkardell commented Dec 5, 2013

@scottgonzalez - I think we are actually in agreement about most of that - Clearly is isn't paying off as much as people initially thought - but why? Maybe becache plays for so many things and it is limited per-domain, etc, etc that there are too many competing factors... The more people give good cache advice, the more it further competes for that limited space, etc. Since CDNs themselves are competing and they host much of the same stuff, it plays against you, etc. It does seem though that some things really are "different" and that some of the goals of hosting things on CDNs are valid and deserve further research/attempts. We have some code/goals/thoughts around this in which you ask for something more NPM style and use something like ServiceWorker to deal with the fact that we know this could be served from a number of places. If you think about it, every version of jquery ever released + every version of jquery UI and assets, all put together is still a pretty negligible amount of space, plus their popularity will kinda cycle in the system - it's unfortunate that we aren't kind of smarter about that - there is really no reason to have N identical copies of a source file. It's tricky to fix, but I think the advantages of getting the sort of "unofficial bits of the internet" "closer to the metal" would be a worthwhile thing..

@yoavweiss
Copy link

@scottgonzalez - If the choice is between a slimmed down self-hosted version and a bloated CDN version, then it's a no-brainer IMO (even if browser cache rates are higher on the CDN)

@bkardell - I don't believe that cache space is the issue here. Caches shouldn't evict popular resources.
I think that the problem is the multitude of framework versions + the fact that there are several "official" CDNs result in the fact that your particular choice of version+CDN is not shared by many other sites.
That means that the % of first-time-users-that-saw-your-framework-elsewhere is not that high, and even there, this particular framework+CDN combination is not recognized as a highly-popular resource (because it isn't) and may be evicted rather fast, to make space for some popular cat photos.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment