Skip to content

Instantly share code, notes, and snippets.

@tunetheweb
Last active February 9, 2020 19:53
Show Gist options
  • Save tunetheweb/2c35cf57ad4940e5baa25253a2c17daf to your computer and use it in GitHub Desktop.
Save tunetheweb/2c35cf57ad4940e5baa25253a2c17daf to your computer and use it in GitHub Desktop.

How many <img> elements have width and height attributes?

Query based on similar query about sizes and srcset created by Colin Bendell for the 2019 Web Almanac

This takes about 850GB to run for 2019 this produces the following results:

Row client hasHeight hasWidth hasPicture total pctHeight pctWidth pctPicture
1 mobile 3415295 3548809 108589 5447942 62.69% 65.14% 1.99%
2 desktop 2794913 2900974 94251 4474762 62.46% 64.83% 2.11%

SQL:

SELECT
  client,
  COUNTIF(hasHeight) AS hasHeight,
  COUNTIF(hasWidth) AS hasWidth,
  COUNTIF(hasPicture) AS hasPicture,
  COUNT(0) AS total,
  ROUND(COUNTIF(hasHeight) * 100 / COUNT(0), 2) AS pctHeight,
  ROUND(COUNTIF(hasWidth) * 100 / COUNT(0), 2) AS pctWidth,
  ROUND(COUNTIF(hasPicture) * 100 / COUNT(0), 2) AS pctPicture
FROM (
  SELECT
    client,
    REGEXP_CONTAINS(body, r'(?is)<(?:img)[^>]*height=[\'"]?([^\'"]*)') AS hasHeight,
    REGEXP_CONTAINS(body, r'(?is)<(?:img)[^>]*width=[\'"]?([^\'"]*)') AS hasWidth,
    REGEXP_CONTAINS(body, r'(?si)<picture.*?<img.*?/picture>') AS hasPicture
  FROM
    `httparchive.almanac.summary_response_bodies`
  WHERE
    firstHtml
)
GROUP BY
  client
ORDER BY
  client DESC
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment