public
Last active

A feeble attempt to improve the HTTP Archive's detection code for YUI

  • Download Gist
gistfile1.aw
PHP
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
/*
I have a *lot* of issues with the way HTTPArchive detects JavaScript libraries,
but this is a feeble attempt to at least improve its detection of YUI to include
both YUI 3.x and YUI 2.x, while minimizing false positives.
 
I haven't touched the patterns for other libraries, but I think they're very
broken as well. The jQuery pattern, for instance, assumes that any URL
containing the string "jquery" is the jQuery JavaScript library, which means it
will return false positives for plugins ("jquery-pluginname.js"), paths
"/jquery/bogus-file.js", and a variety of other URLs.
 
This applies to the patterns for Dojo, Quantcast, Twitter, and ShareThis as
well.
 
The original HTTPArchive JS lib detection code can be found at:
http://code.google.com/p/httparchive/source/browse/trunk/interesting-images.js#331
*/
 
// Old code:
$hCond = array();
$hCond["jQuery"] = "rt.url like '%jquery%'";
$hCond["YUI"] = "rt.url like '%/yui/%'";
$hCond["Dojo"] = "rt.url like '%dojo%'";
$hCond["Google Analytics"] = "(rt.url like '%/ga.js%' or rt.url like '%/urchin.js%')";
$hCond["Quantcast"] = "rt.url like '%quant.js%'";
$hCond["AddThis"] = "rt.url like '%addthis.com%'";
$hCond["Facebook"] = "(rt.url like '%facebook.com/plugins/%' or rt.url like '%facebook.com/widgets/%' or rt.url like '%facebook.com/connect/%')";
$hCond["Google +1"] = "rt.url like '%google.com/js/plusone.js%'";
$hCond["Twitter"] = "rt.url like '%twitter%'";
$hCond["ShareThis"] = "rt.url like '%sharethis%'";
 
// New code (better YUI detection):
$hCond = array();
$hCond["jQuery"] = "rt.url like '%jquery%'";
$hCond["YUI"] = "(rt.url like '%/yui-min.js%' or rt.url like '%/yui.js%' or rt.url like '%/yui-debug.js%' or rt.url like '%/yui-base-min.js%' or rt.url like '%/yui-base.js%' or rt.url like '%/yui-core-min.js%' or rt.url like '%/yui-core.js%' or rt.url like '%/simpleyui.js%' or rt.url like '%/simpleyui-min.js%' or rt.url like '%/yahoo.js%' or rt.url like '%/yahoo-min.js%' or rt.url like '%/yahoo-debug.js%' or rt.url like '%/yahoo-dom-event.js%' or rt.url like '%/yuiloader-dom-event.js%')";
$hCond["Dojo"] = "rt.url like '%dojo%'";
$hCond["Google Analytics"] = "(rt.url like '%/ga.js%' or rt.url like '%/urchin.js%')";
$hCond["Quantcast"] = "rt.url like '%quant.js%'";
$hCond["AddThis"] = "rt.url like '%addthis.com%'";
$hCond["Facebook"] = "(rt.url like '%facebook.com/plugins/%' or rt.url like '%facebook.com/widgets/%' or rt.url like '%facebook.com/connect/%')";
$hCond["Google +1"] = "rt.url like '%google.com/js/plusone.js%'";
$hCond["Twitter"] = "rt.url like '%twitter%'";
$hCond["ShareThis"] = "rt.url like '%sharethis%'";

Do you think it is enough to detect YUI seed files only? How about detecting '%/yui.yahooapis.com/%' for any YUI file, at least coming from Yahoo's CDN?

Not everyone loads YUI from the Yahoo! CDN. Detecting only seed files ensures that we don't treat every YUI module as a single use of YUI, which would be unfair since there could be multiple modules requested on a single page.

Fair enough. So to be accurate, this snippet would have to be adjusted to detect the seed file(s) for the other libraries, which you mentioned already. Thanks!

New with 3.4.0:

or rt.url like '%/yui-core-min.js%' or rt.url like '%/yui-core.js%'

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.