Created
August 9, 2011 23:00
-
-
Save rgrove/1135432 to your computer and use it in GitHub Desktop.
A feeble attempt to improve the HTTP Archive's detection code for YUI
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/* | |
I have a *lot* of issues with the way HTTPArchive detects JavaScript libraries, | |
but this is a feeble attempt to at least improve its detection of YUI to include | |
both YUI 3.x and YUI 2.x, while minimizing false positives. | |
I haven't touched the patterns for other libraries, but I think they're very | |
broken as well. The jQuery pattern, for instance, assumes that any URL | |
containing the string "jquery" is the jQuery JavaScript library, which means it | |
will return false positives for plugins ("jquery-pluginname.js"), paths | |
"/jquery/bogus-file.js", and a variety of other URLs. | |
This applies to the patterns for Dojo, Quantcast, Twitter, and ShareThis as | |
well. | |
The original HTTPArchive JS lib detection code can be found at: | |
http://code.google.com/p/httparchive/source/browse/trunk/interesting-images.js#331 | |
*/ | |
// Old code: | |
$hCond = array(); | |
$hCond["jQuery"] = "rt.url like '%jquery%'"; | |
$hCond["YUI"] = "rt.url like '%/yui/%'"; | |
$hCond["Dojo"] = "rt.url like '%dojo%'"; | |
$hCond["Google Analytics"] = "(rt.url like '%/ga.js%' or rt.url like '%/urchin.js%')"; | |
$hCond["Quantcast"] = "rt.url like '%quant.js%'"; | |
$hCond["AddThis"] = "rt.url like '%addthis.com%'"; | |
$hCond["Facebook"] = "(rt.url like '%facebook.com/plugins/%' or rt.url like '%facebook.com/widgets/%' or rt.url like '%facebook.com/connect/%')"; | |
$hCond["Google +1"] = "rt.url like '%google.com/js/plusone.js%'"; | |
$hCond["Twitter"] = "rt.url like '%twitter%'"; | |
$hCond["ShareThis"] = "rt.url like '%sharethis%'"; | |
// New code (better YUI detection): | |
$hCond = array(); | |
$hCond["jQuery"] = "rt.url like '%jquery%'"; | |
$hCond["YUI"] = "(rt.url like '%/yui-min.js%' or rt.url like '%/yui.js%' or rt.url like '%/yui-debug.js%' or rt.url like '%/yui-base-min.js%' or rt.url like '%/yui-base.js%' or rt.url like '%/yui-core-min.js%' or rt.url like '%/yui-core.js%' or rt.url like '%/simpleyui.js%' or rt.url like '%/simpleyui-min.js%' or rt.url like '%/yahoo.js%' or rt.url like '%/yahoo-min.js%' or rt.url like '%/yahoo-debug.js%' or rt.url like '%/yahoo-dom-event.js%' or rt.url like '%/yuiloader-dom-event.js%')"; | |
$hCond["Dojo"] = "rt.url like '%dojo%'"; | |
$hCond["Google Analytics"] = "(rt.url like '%/ga.js%' or rt.url like '%/urchin.js%')"; | |
$hCond["Quantcast"] = "rt.url like '%quant.js%'"; | |
$hCond["AddThis"] = "rt.url like '%addthis.com%'"; | |
$hCond["Facebook"] = "(rt.url like '%facebook.com/plugins/%' or rt.url like '%facebook.com/widgets/%' or rt.url like '%facebook.com/connect/%')"; | |
$hCond["Google +1"] = "rt.url like '%google.com/js/plusone.js%'"; | |
$hCond["Twitter"] = "rt.url like '%twitter%'"; | |
$hCond["ShareThis"] = "rt.url like '%sharethis%'"; |
Not everyone loads YUI from the Yahoo! CDN. Detecting only seed files ensures that we don't treat every YUI module as a single use of YUI, which would be unfair since there could be multiple modules requested on a single page.
Fair enough. So to be accurate, this snippet would have to be adjusted to detect the seed file(s) for the other libraries, which you mentioned already. Thanks!
New with 3.4.0:
or rt.url like '%/yui-core-min.js%' or rt.url like '%/yui-core.js%'
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Do you think it is enough to detect YUI seed files only? How about detecting '%/yui.yahooapis.com/%' for any YUI file, at least coming from Yahoo's CDN?