Skip to content

Instantly share code, notes, and snippets.

@bgrins
Last active May 10, 2023 17:55
Show Gist options
  • Save bgrins/3c563f6a0b3f6799411d2d877f9e2073 to your computer and use it in GitHub Desktop.
Save bgrins/3c563f6a0b3f6799411d2d877f9e2073 to your computer and use it in GitHub Desktop.
news-homepage-analysis

Analysis of news sites for Speedometer using HTTPArchive

  • create-dataset.sql is used to create a smaller table with both desktop and mobile results so that analysis is cheaper to run. Takes about 5TB of processing to generate this table. Note that the set of URLs is gathered from external links on the pages beneath https://en.wikipedia.org/wiki/Wikipedia:News_sources. The data may be skewed due to the lists on Wikipedia not reflecting the most popular content, including non-news sites, and including URL paths which are ignored. The list could be swapped out with a different list and have queries re-run.
  • query-dataset.sql creates a summary table the first time it runs, and then queries the results of that along with some reporting
  • running-the-same-custom-metrics-in-the-ui.js is meant to be used as a "Custom Metric" on the webpagetest UI as per the instructions at https://github.com/HTTPArchive/custom-metrics/tree/8497c859ef0a7c99924981f369bb53eb3441bd6c#testing

The following custom metrics are used to generate the reports:

Results:

-- Lists are gathered with this command in the console from each of the wikipedia pages below
-- JSON.stringify([...document.querySelectorAll("#mw-content-text a.external")].map(a => a.href), null, 2)
CREATE OR REPLACE TABLE
`httparchive-sandbox.test.news_pages_2023_03_01` AS
SELECT
url,
_TABLE_SUFFIX AS client,
"2023_03_01" AS crawl,
payload
FROM
`httparchive.pages.2023_03_01_*` -- TABLESAMPLE SYSTEM (.1 PERCENT)
WHERE
NET.HOST(url) IN (
SELECT
CONCAT(news_url_host)
FROM (
SELECT
news_url,
NET.HOST(news_url) AS news_url_host
FROM
UNNEST([
-- https://en.wikipedia.org/wiki/Wikipedia:News_sources/Collections
"http://www.allheadlinenews.com/briefs", "http://news.google.com/", "http://newshub.tucows.com/", "http://www.commondreams.org/", "http://www.havenworks.com/news/search", "http://www.havenworks.com/news/browse", "http://schema-root.org/", "https://www.aljazeera.com/", "http://www.abc.net.au/news/", "https://www.bbc.co.uk/news/", "http://www.channelnewsasia.com/", "http://www.ibnlive.com/", "http://www.democracynow.org/", "http://www.flashpoints.net/", "http://www.foxnews.com/", "http://www.ndtv.com/", "http://www.rnw.nl/cgi-bin/home/enhome.pl", "https://www.rte.ie/news/", "http://www.rthk.org.hk/rthk/news/englishnews/", "http://news.sky.com/", "http://www.tagesschau.de/english/", "http://www.aftenposten.no/english/", "http://www.theage.com.au/", "http://www.bangkokpost.com/", "http://business-times.asia1.com.sg/", "http://www2.chinadaily.com.cn/english/home/index.html", "http://www.telegraph.co.uk/", "http://www.dawn.com/", "http://www.deccanherald.com/", "http://www.spiegel.de/", "http://www.economist.com/", "https://www.ft.com/", "http://www.wyborcza.pl/", "http://www.granma.cu/ingles/", "http://www.globeandmail.ca/", "http://www.guardian.co.uk/", "http://www.haaretz.com/", "http://www.thehindu.com/", "http://www.hindustantimes.com/", "http://www.wisers.net/", "http://www.independent.co.uk/", "http://www.indianexpress.com/", "http://www.iht.com/", "http://www.nationmultimedia.com/", "http://www.nationalpost.com/", "http://www.nzz.ch/english/", "http://www.newindpress.com/", "http://newpaper.asia1.com.sg/", "http://www.nst.com.my/", "http://www.nypost.com/", "http://www.nytimes.com/", "http://www.jang.com.pk/thenews/index.html", "http://www.inq7.net/", "http://newsfromrussia.com/", "http://www.sfgate.com/", "http://www.scmp.com/", "http://thestar.com.my/", "http://www.thestatesman.net/", "http://straitstimes.asia1.com.sg/", "http://www.thesun.co.uk/", "http://www.smh.com.au/", "http://www.telegraphindia.com/", "http://www.timesonline.co.uk/", "http://www.timesofindia.com/", "http://www.thestar.com/", "http://www.washingtonpost.com/", "http://www.washtimes.com/", "http://www.afp.com/english/home/", "http://www.googlenews.in/", "https://apnews.com/", "http://www.ipsnews.net/", "http://www.falkland-malvinas.com/index.asp", "http://www.reuters.com/", "http://www.xinhuanet.com/english/", "http://www.allheadlinenews.com/briefs", "http://newstandardnews.net/", "http://www.newswiretoday.com/", "http://health.dailynewscentral.com/", "http://www.opalesque.com/", "http://story.news.yahoo.com/", "http://news.google.com/", "http://www.heise.de/tp/english/default.html", "http://news.aol.com/", "http://www.abcnewsnetwork.org/", "http://www.citizenxpress.com/", "http://www.sumanasa.com/india-news/", "http://www.newsgd.com/", "http://sofiaecho.com/", "http://www.chinastakes.com/", "http://www.thechinaperspective.com/", "http://www.kitco.com/market/marketnews.html", "http://www.oratert.com/", "http://www.oratert.com/news/armenia/index.1.html", "http://www.oratert.com/news/karabakh-artsakh/index.1.html", "http://www.oratert.com/news/azerbaijan/index.1.html", "http://www.oratert.com/news/georgia/index.1.html", "http://www.oratert.com/news/iran/index.1.html", "http://www.oratert.com/news/russia/index.1.html", "http://www.oratert.com/news/turkey/index.1.html",
-- https://en.wikipedia.org/wiki/Wikipedia:News_sources/Africa
"http://allafrica.com/", "http://news.bbc.co.uk/2/hi/africa/default.stm", "http://www.rfi.fr/actuen/listes/001/liste_mots_cles_213.asp", "http://www.irinnews.org/IRIN-Africa.aspx", "http://www.pambazuka.org/en/", "http://www.elkhabar.com/FrEn/?idc=52", "http://www.north-africa.com/one.htm", "http://www.angolapress-angop.ao/index-e.cgi", "https://angola24horas.com/", "http://www.beninembassyus.org/", "http://www.mmegi.bw/2005/October/Friday7/index.html", "http://weekly.ahram.org.eg/index.htm", "http://www.cairolive.com/index.html", "http://www.cairotimes.com/", "http://www.egypttoday.com/", "http://www.dehai.org/", "http://www.addistribune.com/", "http://www.telecom.net.et/~ena/Newsenglish/index.htm", "http://ethiopianreview.homestead.com/", "http://www.ethiopar.net/index.htm", "http://www.waltainfo.com/", "http://www.gambianet.com/", "http://www.gambia.gm/", "http://ghanareview.com/review/", "http://www.ghanaian-chronicle.com/", "http://www.independent-gh.com/", "http://www.statehousekenya.go.ke/", "http://www.lesotho.gov.ls/", "http://www.liberian-connection.com/", "http://www.theperspective.org/", "http://www.libya-watanona.com/", "http://www.nationmalawi.com/", "http://mbc.intnet.mu/news.htm", "http://www.mauritius-news.co.uk/", "http://www.jarida.8m.com/News.html", "http://www.map.ma/eng", "http://www.sortmoz.com/aimnews/English/Menu.htm", "http://www.grnnet.gov.na/intro.htm", "http://www.economist.com.na/", "http://www.namibian.com.na/", "http://www.aittv.com/", "http://www.nopa.net/", "http://www.newswatchngr.com/", "http://www.silverbirdgroup.com/silverbirdtv/", "http://www.vanguardngr.com/", "http://www.guardian.ng/", "http://www.arise.tv/", "http://www.channelstv.com/", "http://www.ictr.org/", "http://www.internews.org/activities/ICTR_reports/ICTR_reports.htm", "http://www.banadir.com/index.shtml", "http://www.businessday.co.za/home.aspx?Page=BD4P1236&MenuItem=BD4P1236", "http://www.iafrica.com/", "http://www.int.iol.co.za/", "http://www.news24.com/News24/HomeLite/", "http://www.sabc.co.za/portal/site", "http://www.sundaytimes.co.za/home/indexsunday.aspx?Page=ST6P197&MenuItem=ST6P197", "http://www.sudan.net/", "http://www.gov.sz/", "http://www.times.co.sz/", "http://www.theexpress.com/", "http://www.ippmedia.com/", "http://www.parliament.go.tz/bunge/index.php", "http://www.tunisiaonlinenews.com/", "http://www.monitor.co.ug/", "http://www.newvision.co.ug/", "http://www.un.org/Depts/dpko/missions/minurso/index.html", "http://www.wsahara.net/", "http://www.btinternet.com/~donald.macdonald/saharawi.htm", "http://www.elections.org.zm/", "http://www.daily-mail.co.zm/", "http://www.zambezitimes.com/Zambia", "http://www.herald.co.zw/", "http://www.fingaz.co.zw/", "http://www.theindependent.co.zw/news/2005/October/Friday7/index.html", "http://www.zbc.co.zw/", "http://www.zwnews.com/",
-- https://en.wikipedia.org/wiki/Wikipedia:News_sources/Oceania
"http://www.abc.net.au/news/", "http://www.news.com.au/", "http://news.ninemsn.com.au/", "http://seven.com.au/news", "http://www.ten.com.au/", "http://www.skynews.com.au/home/", "http://www.foxsports.com.au/", "http://afr.com/", "http://www.theaustralian.com.au/", "http://newsstore.fairfax.com.au/apps/newsSearch.ac", "http://www.crikey.com.au/", "http://www.smh.com.au/", "http://www.dailytelegraph.com.au/", "http://www.theage.com.au/", "http://www.heraldsun.com.au/", "http://www.couriermail.com.au/", "http://www.brisbanetimes.com.au/", "http://www.adelaidenow.com.au/", "http://au.news.yahoo.com/thewest/", "http://www.perthnow.com.au/", "http://www.watoday.com.au/", "http://canberra.yourguide.com.au/home.asp", "http://alicenow.com.au/news/general", "http://www.goldcoast.com.au/news.html", "http://www.illawarramercury.com.au/", "http://www.theherald.com.au/", "http://www.ntnews.com.au/", "http://au.news.yahoo.com/thewest/regional/southwest/", "http://www.themercury.com.au/", "http://www.townsvillebulletin.com.au/", "http://www.inmycommunity.com.au/", "http://www.postnewspapers.com.au/", "http://leader-news.whereilive.com.au/", "http://newslocal.whereilive.com.au/", "http://www.couriermail.com.au/questnews", "http://www.yourguide.com.au/yourguide.asp", "http://www.australia.gov.au/", "http://www.fijitimes.com/", "http://www.fijisun.com.fj/%E2%80%8E", "http://fijione.tv/", "http://www.fbc.com.fj/", "http://www.fijivillage.com/", "http://www.fiji.gov.fj/", "https://www.govt.nz/", "http://www.stuff.co.nz/", "http://www.nbr.co.nz/", "http://www.newstalkzb.co.nz/", "http://www.nzherald.co.nz/", "http://tvnz.co.nz/view/tvone_index_skin/tvone_index_group", "http://www.odt.co.nz/", "http://www.radionz.co.nz/", "http://www.newsroom.co.nz/", "http://scoop.co.nz/", "http://www.stuff.co.nz/", "http://times-age.co.nz/", "http://www.pngonline.gov.pg/", "http://www.postcourier.com.pg/", "http://www.cinews.co.ck/", "http://www.dailypost.vu/", "http://nouvellecaledonie.la1ere.fr/", "http://polynesie.la1ere.fr/", "http://www.lnc.nc/", "http://www.matangitonga.to/article/global_index.shtml", "http://www.samoanews.com/", "http://www.samoaobserver.ws/", "http://www.solomonstarnews.com/", "http://www.solomontimes.com/", "http://www.nzherald.co.nz/tokelau/news/headlines.cfm?l_id=500606", "http://www.pina.com.fj/",
-- https://en.wikipedia.org/wiki/Wikipedia:News_sources/Europe
"http://europa.eu/index_en.htm", "http://euractiv.com/", "http://euobserver.com/", "http://www.euronews.net/create_html.php?page=home", "https://www.gazette.eu.org/", "http://www.euro-reporters.com/", "http://eunews.euroesprit.org/", "http://www.european-voice.com/", "http://www.moscowtimes.ru/indexes/01.html", "http://www.osce.org/", "http://www.praguepost.com/index.php", "http://www.rferl.org/newsline/", "http://www.tol.cz/look/TOL/section.tpl?IdLanguage=1&IdPublication=4&NrIssue=139&tpid=43", "http://www.oratert.com/news/karabakh-artsakh/index.1.html", "http://www.oratert.com/news/", "http://technewscloud.com/", "http://www.albaniannews.com/", "http://www.asbarez.com/", "http://www.armenianow.com/", "http://www.azatutyun.am/", "http://www.cdaily.am/", "http://www.oratert.com/news/armenia/index.1.html/", "http://www.austria.org/", "http://oe1.orf.at/service/international_en", "http://www.austria.gv.at/DesktopDefault.aspx?alias=english&init&init", "http://www.wienerzeitung.at/DesktopDefault.aspx?TabID=4082&Alias=wzo", "http://www.today.az/", "http://www.en.apa.az/", "http://en.trend.az/", "http://www.anspress.com/", "http://www.azernews.az/site/", "http://www.theazeritimes.com/", "http://www.azerbaijantoday.com/", "http://www.spacetv.az/?lng=az", "http://www.azertag.com/index_en.html", "http://www.turan.az/Default_en.asp", "http://www.aztv.az/index-en.shtml", "http://www.xalqqazeti.com/index.php?lngs=eng", "http://www.525.az/new/", "http://www.zaman.az/site/", "http://www.sherg.az/", "http://en.itv.az/", "http://www.azadliq.org/", "http://www.azerbaijan.az/_News/_news_e.html?lang=en", "http://www.ayna.az/", "http://www.ikisahil.com/content/index.php?link=main.php", "http://bizimasr.media-az.com/", "http://www.ses-az.com/view.php?lang=az&menu=0", "http://www.xalqcebhesi.az/", "https://www.musavat.com/new/", "http://www.yeniazerbaycan.com/", "http://www.tezadlar.az/", "http://www.bizimyol.az/", "http://www.express.com.az/", "http://www.kaspi.az/", "http://www.nedelya.az/", "http://www.nashvek.com/view.php", "http://ourcentury.media-az.com/", "http://www.respublica.news.az/", "http://www.yenicag.az/", "http://en.mirzexezerinsesi.net/", "http://www.oratert.com/news/azerbaijan/index.1.html", "http://www.president.gov.by/eng/", "http://www.expatica.com/source/site_content_subchannel.asp?subchannel_id=24", "http://www.aimpress.ch/dyn/trae/trae-sar.htm", "http://www.iwpr.net/home_index_new.html", "http://www.mvp.gov.ba/Index_eng.htm", "http://www.nato.int/sfor/index.htm", "http://www.bta.bg/site/en/indexe.shtml", "http://www.government.bg/", "http://www.parliament.bg/?lng=en", "http://www.sofiaecho.com/", "http://www.novinite.com/", "http://websrv2.hina.hr/hina/web/index.action?request_locale=en", "http://www.jutarnji.hr/", "http://www.vecernji.hr/", "http://slobodnadalmacija.hr/", "http://vijesti.hrt.hr/", "http://www.iwpr.net/home_index_new.html", "http://www.hri.org/news/cyprus/riken/last/last.html", "http://www.cyprus-mail.com/news/", "http://kypros.org/cgi-bin/headlines?/News/Update/english.html", "http://www.cna.org.cy/website/english/index.asp", "http://www.cyprusweekly.com.cy/", "http://www.financialmirror.com/", "http://www.moi.gov.cy/moi/pio/pio.nsf/index_en/index_en?opendocument", "http://www.trncwashdc.org/", "http://www.ceskenoviny.cz/news/", "http://www.nzz.ch/english/", "http://www.psp.cz/cgi-bin/eng/", "http://www.prague-tribune.cz/", "http://www.radio.cz/english/", "http://www.cphpost.dk/", "http://www.ft.dk/?/samling/20051/menu/00000005.htm", "http://www.wrn.org/listeners/stations/station.php?StationID=11", "http://www.baltictimes.com/", "http://www.bbn.ee/", "http://news.err.ee/", "http://news.postimees.ee/", "http://www.president.ee/en/", "https://valitsus.ee/en/news", "http://www.helsinginsanomat.fi/english/", "http://virtual.finland.fi/news/", "http://www.aloufok.net/", "http://fr.newswiretoday.com/", "http://www.assemblee-nationale.fr/english/index.asp", "http://www.iht.com/", "http://www.parliament.ge/", "http://www.oratert.com/news/georgia/index.1.html/", "http://www.dw-world.de/dw/0,,266,00.html", "http://www.germany-info.org/relaunch/index.html", "http://www.bundestag.de/htdocs_e/index.html", "http://www.bundesregierung.de/en", "http://service.spiegel.de/cache/international/", "http://www.tagesschau.de/english/", "http://www.heise.de/tp/english/default.html", "http://aegeantimes.net/", "http://www.athensnews.gr/athweb/nathens.index_htm?e=C", "http://www.ana.gr/anaweb/", "http://www.greekembassy.org/press/index.html", "http://www.hri.org/", "http://www.ekathimerini.com/", "http://www.ert.gr/en/", "http://www.bbj.hu/", "http://www.wrn.org/listeners/stations/station.php?StationID=9", "http://www.icelandreview.com/", "http://www.scandinavianow.com/", "http://www.irlgov.ie/", "http://www.irishexaminer.com/pport/web/irishexaminer/", "http://larkspirit.com/general/irishhub.html", "http://republican-news.org/current/news/index.html", "http://www.ireland.com/newspaper/front/2005/1029/", "http://www.sbpost.ie/post/pages/home.aspx-qqqt%3D-qqqs%3Dnav-qqqx%3D1x-qqqt%3D-qqqs%3Dnews-qqqx%3D1.asp", "https://www.rte.ie/news/index.html", "http://www.unison.ie/irish_independent/", "http://home.eircom.net/news/", "http://www.ansa.it/main/notizie/awnplus/english/english.html", "http://www.ilsole24ore.com/", "http://www.repubblica.it/", "http://www.corriere.it/", "http://www.channelonline.tv/channelonline/", "http://www.thisisjersey.com/", "https://www.bbc.co.uk/news/world/europe/jersey/", "http://www.balticsww.com/wkcrier/daily_news.htm", "http://www.baltictimes.com/", "http://www.president.lv/index.php?pid=210", "http://www.radio.org.lv/Lapas/EN_Index.htm", "https://www.lsm.lv/", "http://www.news.li/news/index.htm", "http://www.lrv.lt/main_en.php", "http://www.iwpr.net/home_index_new.html", "https://mia.mk//", "http://www.seeq.com/lander.jsp?referrer=http%3A%2F%2Fwww.gua…622982%2C00.html&domain=maltabusinessweekly.com&cm_mmc=Malta", "http://www.di-ve.com/dive/portal/main.html", "http://www.gov.mt/", "http://217.145.4.56/ind/", "http://www.maltamedia.com/artman/publish/news.shtml", "https://www.timesofmalta.com/core/index.php", "http://www.basa.md/?", "http://cg.mnnews.net/indexeng.php3?akcija=mnnews", "http://www.minbuza.nl/default.asp?CMS_ITEM=12E5DC3F5E024ADFB2AA6B315606A627X2X31365X4", "http://www2.rnw.nl/rnw/en/", "http://www.aftenposten.no/english/", "http://odin.dep.no/odin/global/language-no/index-b-n-a.html", "http://www.norwaypost.no/", "http://www.kprm.gov.pl/english/index.html", "http://www.poland.pl/", "http://www.thenews.pl/", "http://www.rzeczpospolita.pl/jezyki/index.html", "http://www.wbj.pl/", "http://www.warsawvoice.com.pl/", "http://www.polandmonthly.pl/", "http://www.portugal.org/index.shtml", "http://the-news.net/", "http://www.agerpres.ro/english/index.php/english.html", "http://www.mediafax.ro/english/", "http://english.hotnews.ro/", "http://www.radardemedia.ro/", "http://www.gov.ro/engleza/index.html", "http://www.nineoclock.ro/", "http://www.gazeta.ru/english/", "http://www.interfax.ru/?lang=e&", "http://english.pravda.ru/", "http://www.rbcnews.com/", "http://www.oratert.com/news/russia/index.1.html/", "http://en.rian.ru/", "http://www.moscowtimes.ru/indexes/01.html", "http://beta.russiajournal.com/", "http://www.gov.ru/", "http://www.sptimes.ru/", "http://www.vor.ru/world.html", "http://rbtn.ru/", "http://www.b92.net/english/", "http://www.srbija.gov.rs/", "http://www.iwpr.net/home_index_new.html", "http://www.tanjug.rs/", "http://www.government.gov.sk/english/", "http://www.slovensko.com/news/", "http://www.sigov.si/", "http://www.iwpr.net/home_index_new.html", "http://www.costablanca-news.com/", "http://www.la-moncloa.es/default.htm", "http://www.surinenglish.com/index.php", "http://www.thelocal.se/", "http://sverigesradio.se/sida/default.aspx?programid=2054", "http://www.nzz.ch/english/index.html", "http://www.swissinfo.org/", "http://www.turkishweekly.com/", "http://aegeantimes.net/", "http://www.turkishdailynews.com.tr/", "http://www.turkishpress.com/", "http://www.zaman.com/", "http://www.kurdistanobserver.com/", "http://www.oratert.com/news/turkey/index.1.html", "http://www.4ni.co.uk/", "http://www.accountancyage.com/", "http://www.aidsmap.com/", "http://www.amnesty.org.uk/", "https://www.bbc.co.uk/news/", "http://www.belfasttelegraph.co.uk/", "http://www.channel4.com/news", "http://www.number-10.gov.uk/output/Page1.asp", "http://www.theecologist.co.uk/home.asp", "http://www.economist.com/index.html", "http://www.fwi.co.uk/", "http://news.ft.com/home/uk", "http://www.guardian.co.uk/", "http://icwales.icnetwork.co.uk/", "http://www.independent.co.uk/", "http://www.itn.co.uk/", "http://www.mirror.co.uk/", "http://www.newstatesman.com/", "http://www.private-eye.co.uk/", "http://today.reuters.co.uk/news/default.aspx", "http://www.socialistworker.co.uk/", "http://www.sky.com/skynews/home", "http://www.thesun.co.uk/", "http://www.telegraph.co.uk/portal/main.jhtml", "http://thescotsman.scotsman.com/", "http://www.spectator.co.uk/index.php", "http://www.ukremb.com/", "http://www.interfax.kiev.ua/eng/", "http://www.kyivpost.com/", "http://www.window.com.ua/", "http://www.apostrophe.ua/en/",
-- https://en.wikipedia.org/wiki/Wikipedia:News_sources/China
"http://www.metroradio.com.hk/1044metroplus/engnews/", "http://www.chinaonline.com/", "http://www.china-embassy.org/eng/", "http://en.chinabroadcast.cn/", "http://english.eastday.com/", "http://www.theepochtimes.com/123,92,,1.html", "http://news.info.gov.hk/", "http://english.peopledaily.com.cn/", "http://www.rthk.org.hk/rthk/news/englishnews/", "http://www.thestandard.com/", "http://202.84.17.11/en/index.htm", "http://www.wisers.net/", "http://www.mingpaonews.com/", "http://news.tvb.com/",
-- https://en.wikipedia.org/wiki/Wikipedia:News_sources/India
"http://allindiaradio.gov.in/", "http://www.ptinews.com/", "https://www.etvbharat.com/", "http://www.ibnlive.com/", "http://www.dailyexcelsior.com/", "http://www.dnaindia.com/", "http://www.deccanherald.com/", "https://www.indiatvnews.com/", "https://www.news18.com/amp/", "http://www.gomantaktimes.com/MAI25705.htm", "http://goidirectory.nic.in/", "https://www.thehindu.com//", "http://www.hindustantimes.com/", "http://www.newindpress.com/", "http://www.ranchiexpress.com/news/", "http://www.rediff.com/news/", "http://theullekh.com/", "http://www.sumanasa.com/india-news/", "http://www.samachar.com/", "http://www.onlineindiannews.com/news/", "http://www.thestatesman.net/", "http://www.sunnt.com/", "http://www.indianexpress.com/sunday/index.html", "http://www.telegraphindia.com/section/frontpage/index.asp", "http://timesofindia.indiatimes.com/?", "http://www.tribuneindia.com/", "http://www.the-week.com/", "http://www.b4uindia.com/", "http://www.theprint.in/", "https://connectgujarat.com/",
-- https://en.wikipedia.org/wiki/Wikipedia:News_sources/Asia
"http://www.atimes.com/", "http://www.dawn.com/2005/11/27/index.htm", "http://www.feer.com/", "http://www.thejakartapost.com/headlines.asp", "http://www.menafn.com/", "http://english.aljazeera.net/HomePage", "http://www.arabicnews.com/ansub/", "http://www.middle-east-online.com/english/", "http://www.arab.net/", "https://news.writecaliber.com/", "http://www.omaid.com/english_section/curr_issue.htm", "http://www.afgha.com/", "http://www.ans-dx.com/", "http://azerbaijannews.net/", "http://www.bakusun.az:8101/", "http://www.president.az/", "http://www.oratert.com/news/azerbaijan/index.1.html", "http://www.bahraintribune.com/", "http://www.gulf-daily-news.com/", "http://www.bangladeshobserveronline.com/new/2005/11/27/index.htm", "http://www.thedailystar.net/", "http://www.weeklyholiday.net/", "http://www.independent-bangladesh.com/", "http://www.bangladesh-web.com/news/", "http://nation.ittefaq.com/artman/publish/", "http://www.bangladeshgov.org/", "http://www.bbs.com.bt/", "http://www.kuenselonline.com/", "http://www.ibiblio.org/freeburma/", "http://www.mizzima.com/mizzima/", "http://www.ncgub.net/", "http://www.phnompenhpost.com/", "http://www.vocri.org/", "http://www.thejakartapost.com/headlines.asp", "http://www.antara.co.id/en/", "http://www.kompas.com/", "http://www.indonesianembassy.org.uk/", "http://www.presstv.com/", "http://www.tehrantimes.com/", "http://www.mehrnews.com/en/", "http://www.ilna.ir/indexEN.aspx", "http://www.isna.ir/ISNA/Default.aspx?Lang=E", "http://www.irna.ir/En/default.aspx?IdLanguage=3", "http://english.iribnews.ir/", "http://www.alalam-news.com/english/", "http://www.iran-daily.com/1384/2459/html/", "http://www.iranian.com/news.html", "http://www.oratert.com/news/iran/index.1.html", "http://www.baghdadbulletin.com/", "http://www.iraqcrisis.co.uk/home.php", "http://www.iraq-today.com/", "http://www.haaretz.com/", "http://www.ynetnews.com/home/0,7340,L-3083,00.html", "http://web.israelinsider.com/home.htm", "http://info.jpost.com/C005/JREPORT/", "http://www.jpost.com/", "http://www.israelnationalnews.com/", "http://www.wrn.org/listeners/stations/station.php?StationID=35", "http://www.asahi.com/english/english.html", "http://www.yomiuri.co.jp/dy/", "http://www.fnn-news.com/en/index.html", "http://www.japantimes.co.jp/", "http://www.japantoday.com/", "http://home.kyodo.co.jp/", "http://www.nni.nikkei.co.jp/", "http://www.tokyoartbeat.com/", "http://www.art-it.asia/top/?lang=en", "http://pingmag.jp/", "http://www.japantimes.co.jp/entertainment/art.html", "http://www.dnp.co.jp/artscape/eng/index.html", "http://www.jordantimes.com/fri/index.htm", "http://www.petra.gov.jo/main.asp", "http://www.kinghussein.gov.jo/jordan.html#", "http://www.arabtimesonline.com/arabtimes/index.asp", "http://www.kuwaittimes.net/", "http://www.zawya.com/countries/kw/", "http://www.dailystar.com.lb/home2.asp", "http://www.lebanon.com/", "http://www.mmorning.com/", "http://www.dailyexpress.com.my/", "http://www.malaysiakini.com/", "http://thestar.com.my/", "https://www.utusan.com.my/index.asp?pub=Utusan_Express", "http://www.nst.com.my/", "http://www.haveeru.com.mv/?page=english", "http://www.presidencymaldives.gov.mv/v3/index.phtml", "http://www.minivannews.com/news/news.php?id=1263", "http://www.dhivehiobserver.com/", "http://www.myfreenepal.com/", "http://www.nepalnews.com/archive/main.htm", "http://www.radionepal.org/", "http://www.minivannews.com/news/news.php?id=1263", "http://www.kcna.co.jp/index-e.htm", "http://www.omanobserver.com/", "http://www.timesofoman.com/", "http://www.amin.org/", "http://www.palestinechronicle.com/", "http://www.aqsa.org.uk/", "http://www.alternativenews.org/", "http://www.miftah.org/", "http://dawn.com/", "http://geo.tv/", "http://inewsjournal.com/", "http://www.frontierpost.com.pk/", "http://www.jang.com.pk/thenews/", "http://paktribune.com/", "http://pakistantimes.net/2005/10/10A/index.htm", "http://www.paktoday.com/", "http://www.radio.gov.pk/index.asp", "http://www.weeklyindependent.com/", "http://www.infopak.gov.pk/", "http://www.abs-cbnnews.com/", "http://www.gmanetwork.com/news/", "http://www.inquirer.net/", "http://www.philstar.com/", "http://www.gulf-times.com/site/topics/index.asp?cu_no=2&temp_type=44", "http://english.mofa.gov.qa/", "http://www.ain-al-yaqeen.com/", "http://www.saudia-online.com/", "http://www.saudinf.com/", "http://www.channelnewsasia.com/", "http://straitstimes.asia1.com.sg/", "http://business-times.asia1.com.sg/", "http://newpaper.asia1.com.sg/", "http://www.zaobao.com/", "http://cyberita.asia1.com.sg/", "http://tamilmurasu.asia1.com.sg/", "http://english.chosun.com/", "http://www.koreaherald.co.kr/index.asp", "http://rki.kbs.co.kr/", "http://www.korea.net/", "http://www.aruna.lk/", "https://www.themorning.lk/", "https://www.hirunews.lk/", "https://www.adaderana.lk/", "https://readme.lk/", "http://www.dailynews.lk/", "http://www.island.lk/", "http://www.thesundayleader.lk/home.html", "http://www.sundaytimes.lk/", "http://www.tamileelamnews.com/", "http://www.news.lk/", "http://www.chinapost.com.tw/", "http://english.rti.com.tw/", "http://www.taipeitimes.com/News", "http://www.gio.gov.tw/", "http://www.nationmultimedia.com/", "http://www.biz-day.com/", "http://www.bangkokpost.net/", "http://www.pattayamail.com/", "http://www.phuketgazette.net/", "http://www.parliament.go.th/files/mainpage.htm", "http://www.turkmenistanembassy.org/", "http://www.godubai.com/gulftoday/", "http://www.khaleejtimes.com/index00.asp", "http://www.gulf-news.com/home/index.html", "http://www.government.ae/gov/en/index.jsp", "http://www.gov.uz/", "http://vietnamnews.vnagency.com.vn/", "http://www.vov.org.vn/", "http://www.nhandan.com.vn/english/", "http://yementimes.com/index.shtml?",
-- https://en.wikipedia.org/wiki/Wikipedia:News_sources/Americas
"http://www.buenosairesherald.com/", "http://www.belize.gov.bz/", "http://www.infobrazil.com/", "http://internacional.radiobras.gov.br/ingles/index.htm", "http://www.estado.com.br/", "http://oglobo.globo.com/", "http://www.ctv.ca/", "http://www.cbc.ca/", "http://www.globalnews.ca/", "http://www.chch.com/", "http://www.calgaryherald.com/", "http://www.canoe.ca/CNEWS/home.html", "http://www.edmontonjournal.com/", "http://www.theglobeandmail.com/", "http://thechronicleherald.ca/", "http://www.thehaltonherald.ca/", "http://www.thehilltimes.ca/", "http://www.thewhig.com/Default.aspx", "http://www.metronews.ca/", "http://www.montrealgazette.com/", "http://www.nationalpost.com/", "http://www.nunatsiaq.com/index.html", "http://www.canada.com/ottawacitizen/index.html", "http://www.ottawasun.com/", "http://www.leaderpost.com/", "http://www.thetelegram.com/", "http://www.thestarphoenix.com/", "http://www.thestar.com/", "http://www.torontosun.com/", "http://www.canada.com/theprovince/index.html", "http://www.canada.com/vancouversun/index.html", "http://www.timescolonist.com/", "http://www.winnipegfreepress.com/", "http://canada.gc.ca/", "http://www.caribbeannetnews.com/", "http://www.cananews.net/", "http://www.amcostarica.com/", "http://www.ticotimes.net/", "http://www.costaricanewssite.com/", "http://www.granma.cu/ingles/", "http://balistrad.com/", "http://www.lenouvelliste.com/", "http://www.marrder.com/htw/", "http://www.latinnews.com/", "http://www.mexiconews.com.mx/", "http://www.theaerogram.com/", "https://allheadlinenews.com/", "http://www.armytimes.com/", "http://www.cnn.com/", "http://www.csmonitor.com/", "http://www.democracynow.org/", "http://news.ft.com/home/us", "http://www.drudgereport.com/", "http://www.newswiretoday.com/", "http://knowledge.wharton.upenn.edu/", "http://www.latimes.com/", "http://www.npr.org/", "http://www.nytimes.com/", "http://www.newstandardnews.net/", "http://www.newsday.com/", "http://www.newsfactor.com/", "http://www.newseum.org/", "http://www.msnbc.com/", "http://www.rollcall.com/", "http://www.poynter.org/", "http://www.washingtonpost.com/", "http://online.wsj.com/public/us", "http://www.washtimes.com/", "http://www.usatoday.com/", "http://www.wired.com/" ]) AS news_url) )
-- #standardSQL
CREATE TEMPORARY FUNCTION
getTotalElements(element_count STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS '''
try {
element_count = JSON.parse(element_count);
if (Array.isArray(element_count)) {
return [];
}
let c = 0;
for (let k of Object.keys(element_count)) {
c += element_count[k];
}
return [c];
} catch (e) {
return [];
}
''';
CREATE TEMPORARY FUNCTION
getCssVariables(css_variables STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS '''
try {
css_variables = JSON.parse(css_variables).summary;
return [Object.keys(css_variables).length];
} catch (e) {
return [0];
}
''';
CREATE TABLE IF NOT EXISTS
`httparchive-sandbox.test.news_pages_summary_2023_03_01` AS
SELECT
url,
client,
JSON_VALUE(payload, '$._avg_dom_depth') AS avg_dom_depth,
total_elements,
total_css_variables,
JSON_VALUE(payload, '$._element_count') AS element_count,
JSON_VALUE(payload, '$._css') AS css,
FROM
`httparchive-sandbox.test.news_pages_2023_03_01`,
UNNEST(getTotalElements(JSON_EXTRACT_SCALAR(payload, '$._element_count'))) AS total_elements,
UNNEST(getCssVariables(JSON_EXTRACT_SCALAR(payload, '$._css-variables'))) AS total_css_variables
ORDER BY
NET.HOST(url);
SELECT
*
FROM
`httparchive-sandbox.test.news_pages_summary_2023_03_01`;
SELECT
client,
AVG(CAST(avg_dom_depth AS FLOAT64)) AS average_avg_dom_depth,
APPROX_QUANTILES(CAST(avg_dom_depth AS FLOAT64), 2)[
OFFSET
(1)] AS median_avg_dom_depth,
MAX(CAST(avg_dom_depth AS FLOAT64)) AS max_avg_dom_depth,
STDDEV(CAST(avg_dom_depth AS FLOAT64)) AS stddev_avg_dom_depth,
AVG(CAST(total_elements AS FLOAT64)) AS average_total_elements,
APPROX_QUANTILES(CAST(total_elements AS FLOAT64), 2)[
OFFSET
(1)] AS median_total_elements,
MAX(CAST(total_elements AS FLOAT64)) AS max_total_elements,
STDDEV(CAST(total_elements AS FLOAT64)) AS stddev_total_elements,
AVG(CAST(total_css_variables AS FLOAT64)) AS average_total_css_variables,
APPROX_QUANTILES(CAST(total_css_variables AS FLOAT64), 2)[
OFFSET
(1)] AS median_total_css_variables,
MAX(CAST(total_css_variables AS FLOAT64)) AS max_total_css_variables,
STDDEV(CAST(total_css_variables AS FLOAT64)) AS stddev_total_css_variables
FROM
`httparchive-sandbox.test.news_pages_summary_2023_03_01`
GROUP BY
client;
[avg_dom_depth]
function avgDomDepth() {
var aElems = document.getElementsByTagName('*');
var i = aElems.length;
var totalParents = 0;
while ( i-- ) {
totalParents += numParents(aElems[i]);
}
var average = totalParents/aElems.length;
return average;
}
function numParents(elem) {
var n = 0;
if ( elem.parentNode ) {
while ( elem = elem.parentNode) {
n++;
}
}
return n;
}
return Math.round(avgDomDepth());
[element_count]
return JSON.stringify(
Array.from(
document
.querySelectorAll('*'))
.reduce((acc, el) => {
let tag = el.tagName.toLowerCase()
acc[tag] = (typeof acc[tag] !== 'undefined') ? acc[tag] : 0
acc[tag]++
return acc
}, {}
)
)
[css-variables]
//[css-variables]
function analyzeVariables() {
const PREFIX = "almanac-var2020-";
// Selector to find elements that are relevant to the graph
const selector = `.${PREFIX}element, [style*="--"]`;
// Extract a list of custom properties set by a value
function extractValueProperties(value) {
// https://drafts.csswg.org/css-syntax-3/#ident-token-diagram
let ret = value.match(/var\(--[-\w\u{0080}-\u{FFFF}]+(?=[,)])/gui)?.map(p => p.slice(4));
if (ret) {
// Drop duplicates
return [...new Set(ret)];
}
}
let visited = new Set();
let modifiedRules = [];
// Recursively walk a CSSStyleRule or CSSStyleDeclaration
function walkRule(rule, ret) {
if (!rule || visited.has(rule)) {
return;
}
visited.add(rule);
let style, selector;
if (rule instanceof CSSStyleRule && rule.style) {
style = rule.style;
selector = rule.selectorText;
}
else if (rule instanceof CSSStyleDeclaration) {
style = rule;
selector = "";
}
if (style) {
let condition;
// mirror properties to add. We add them afterwards, so we don't pointlessly traverse them
let additions = {};
for (let property of style) {
let value = style.getPropertyValue(property);
let containsRef = value.indexOf("var(--") > -1;
let setsVar = property.indexOf("--") === 0 && property.indexOf("--" + PREFIX) === -1;
if (containsRef || setsVar) {
if (!condition && rule.parentRule) {
condition = [];
let r = rule;
while (r.parentRule?.conditionText) {
r = r.parentRule;
condition.push({
type: r instanceof CSSMediaRule? "media" : "supports",
test: r.conditionText
});
}
}
if (containsRef) {
// Set mirror property so we can find it in the computed style
additions["--" + PREFIX + property] = value.replace(/var\(--/g, PREFIX + "$&");
let properties = extractValueProperties(value);
for (let prop of properties) {
let info = ret[prop] = ret[prop] || {get: [], set: []};
info.get.push({ usedIn: property, value, selector, condition });
}
}
if (setsVar) {
let info = ret[property] = ret[property] || {get: [], set: []};
info.set.push({ value, selector, condition });
}
// Add class so we can find these later
if (selector) {
for (let el of document.querySelectorAll(selector)) {
el.classList.add(`${PREFIX}element`);
}
}
}
}
// Now that we're done, add the mirror properties
for (let property in additions) {
modifiedRules.push({style, additions});
style.setProperty(property, additions[property]);
}
}
if (rule instanceof CSSMediaRule || rule instanceof CSSSupportsRule) {
// rules with child rules, e.g. @media, @supports
for (let r of rule.cssRules) {
walkRule(r, ret);
}
}
}
// Return a subset of the DOM tree that contains variable reads or writes
function buildGraph() {
// Elements that contain variable reads or writes.
let elements = new Set(document.querySelectorAll(selector));
let map = new Map(); // keep pointers to object for each element
let ret = [];
for (let element of elements) {
map.set(element, {element, children: []});
}
for (let element of elements) {
let ancestor = element.parentNode.closest?.(selector);
let obj = map.get(element);
if (ancestor) {
let o = map.get(ancestor);
o.children.push(obj)
}
else {
// Top-level
ret.push(obj);
}
let cs = element.computedStyleMap();
let parentCS = element.parentNode.computedStyleMap?.();
let vars = extractVars(cs, parentCS);
if (Object.keys(vars).length > 0) {
obj.declarations = vars;
}
}
return ret;
}
// Extract custom property declarations from a computed style map
// The schema of the returned object is:
// {get: {--var1: [{property, value, computedValue}]}, set: {--var2: {value, type}}}
function extractVars(cs, parentCS) {
let ret = {};
let norefs = {};
for (let [property, [originalValue]] of cs) {
// Do references first
if (property.indexOf("--") === 0) {
let value = originalValue + "";
// Skip inherited values
if (parentCS && (parentCS.get(property) + "" === value + "")) {
continue; // most likely inherited
}
if (property.indexOf("--" + PREFIX) === 0) {
// Usage
let originalProperty = property.replace("--" + PREFIX, "");
value = value.replace(RegExp(PREFIX + "var\\(--", "g"), "var(--");
let properties = extractValueProperties(value);
let computed = cs.get(originalProperty) + "";
ret[originalProperty] = {
value,
references: properties
}
if (computed !== value) {
ret[originalProperty].computed = computed;
}
}
else {
// Definition
norefs[property] = {value};
if (originalValue + "" !== value) {
norefs[property].computed = originalValue + "";
}
// If value is of another type, we have Houdini P&V usage!
if (!(originalValue instanceof CSSUnparsedValue)) {
norefs[property].type = Object.prototype.toString.call(originalValue).slice(8, -1);
}
}
}
}
// Merge static with ret
for (let property in norefs) {
if (!(property in ret)) {
ret[property] = norefs[property];
}
}
return ret;
}
let summary = {};
// Walk through stylesheet and add custom properties for every declaration that uses var()
// This way we can retrieve them in the computed styles and build a dependency graph.
// Otherwise, they get resolved before they hit the computed style.
for (let stylesheet of document.styleSheets) {
try {
var rules = stylesheet.cssRules;
}
catch (e) {}
if (rules) {
for (let rule of rules) {
walkRule(rule, summary);
}
}
}
function collapseDuplicateSiblings(arr) {
if (arr) {
let map = {};
for (let child of arr) {
let serialized = serialize(child);
if (serialized in map) {
// Dupe
map[serialized]++;
}
else {
map[serialized] = 0;
}
}
let entries = Object.entries(map);
if (entries.length < arr.length) {
// There are duplicates
arr = entries.map(e => {
let child = JSON.parse(e[0]);
if (e[1] > 0) {
child.times = e[1] + 1;
}
return child;
})
}
arr.forEach(o => {
o.children = collapseDuplicateSiblings(o.children);
});
}
return arr;
}
// Do the same thing with inline styles
for (let element of document.querySelectorAll('[style*="--"]')) {
walkRule(element.style, summary);
}
let computed = buildGraph();
// Cleanup: Remove classes
for (let el of document.querySelectorAll(`.${PREFIX}element`)) {
el.classList.remove(`${PREFIX}element`);
}
// Cleanup: Remove custom properties
for (let o of modifiedRules) {
for (let prop in o.additions) {
o.style.removeProperty(prop);
}
}
computed = collapseDuplicateSiblings(computed);
return {summary, computed};
};
function serialize(data, separator) {
return JSON.stringify(data, (key, value) => {
if (value instanceof HTMLElement) {
let str = value.tagName;
if (value.classList.length > 0) {
str += "." + [...value.classList].join(".")
}
if (value.id) {
str += "#" + value.id;
}
return str;
}
// remove empty arrays
if (Array.isArray(value) && value.length === 0) {
return;
}
return value;
}, separator);
}
return serialize(analyzeVariables());
[css]
//[css]
// Uncomment the previous line for testing on webpagetest.org
const PREFERS_COLOR_SCHEME_REGEXP =
/(?:@media\s*\(\s*prefers-color-scheme\s*:\s*(?:dark|light)\s*\)\s*\{[^\}]*\}|matchMedia\s*\(\s*['"]\s*\(\s*prefers-color-scheme\s*:\s*(?:dark|light)\s*\)\s*['"]\s*\))/gms;
const bodies = $WPT_BODIES;
function countExternalCssInHead() {
return document.querySelectorAll( 'head link[rel="stylesheet"]' ).length;
}
function countInlineCssInHead() {
return document.querySelectorAll( 'head style' ).length;
}
function countExternalCssInBody() {
return document.querySelectorAll( 'body link[rel="stylesheet"]' ).length;
}
function countInlineCssInBody() {
return document.querySelectorAll( 'body style' ).length;
}
return JSON.stringify({
css_in_js: (() => {
const CssInJsMap = {
'Styled Components': !!document.querySelector(
'style[data-styled],style[data-styled-components]',
),
Radium: !!document.querySelector('[data-radium]'),
JSS: !!document.querySelector('[data-jss]'),
Emotion: !!document.querySelector('[data-emotion]'),
Goober: !!document.getElementById('_goober'),
'Merge Styles': !!document.querySelector('[data-merge-styles]'),
'Styled Jsx': !!document.querySelector('style[id*="__jsx-"]'),
Aphrodite: !!document.querySelector('[data-aphrodite]'),
Fela: !!document.querySelector('[data-fela-stylesheet]'),
Styletron: !!document.querySelector(
'[data-styletron],._styletron_hydrate_',
),
'React Native for Web': !!document.querySelector(
'#react-native-stylesheet',
),
Glamor: !!document.querySelector('[data-glamor]'),
};
const usedLibraries = [];
for (l in CssInJsMap) {
if (CssInJsMap[l]) {
usedLibraries.push(l);
}
}
return usedLibraries;
})(),
// Checks in two passes:
// 1. The response bodies.
// 2. The `link[media]` attribute of conditionally loaded stylesheets in the
// ternary expression if step 1. returns `false`.
prefersColorScheme:
bodies.some((request) => {
return (
(request.type === 'Stylesheet' || request.type === 'Script') &&
PREFERS_COLOR_SCHEME_REGEXP.test(request.response_body || '')
);
}) ||
// If none of the response bodies match, alternatively check if any of the
// stylesheet `link`s load conditionally based on `prefers-color-scheme`.
document.querySelectorAll(
'link[rel="stylesheet"][media*="prefers-color-scheme"]',
).length > 0,
externalCssInHead: countExternalCssInHead(),
externalCssInBody: countExternalCssInBody(),
inlineCssInHead: countInlineCssInHead(),
inlineCssInBody: countInlineCssInBody(),
});
#!/bin/bash
urls=(
"https://raw.githubusercontent.com/HTTPArchive/custom-metrics/8497c859ef0a7c99924981f369bb53eb3441bd6c/dist/avg_dom_depth.js"
"https://raw.githubusercontent.com/HTTPArchive/custom-metrics/8497c859ef0a7c99924981f369bb53eb3441bd6c/dist/element_count.js"
"https://raw.githubusercontent.com/HTTPArchive/custom-metrics/8497c859ef0a7c99924981f369bb53eb3441bd6c/dist/css-variables.js"
"https://raw.githubusercontent.com/HTTPArchive/custom-metrics/8497c859ef0a7c99924981f369bb53eb3441bd6c/dist/css.js"
)
output_file="combined_file.txt"
# Remove the existing output file, if it exists
if [ -e "$output_file" ]; then
rm "$output_file"
fi
# Iterate over the URLs
for url in "${urls[@]}"; do
# Extract the file name from the URL
file_name=$(basename "$url")
file_name="${file_name%.js}" # Remove .js extension using parameter expansion
# Download the file and append its contents to the output file
echo "[$file_name]" >> "$output_file"
curl -s "$url" >> "$output_file"
echo >> "$output_file" # Add an empty line for separation
done
echo "Files have been combined into $output_file"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment