Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save zcorpan/4684f8586b52638a75e661e8b75f27cb to your computer and use it in GitHub Desktop.
Save zcorpan/4684f8586b52638a75e661e8b75f27cb to your computer and use it in GitHub Desktop.
Pages in httparchive with -(webkit|moz)-appearance: menulist-textfield ordered by rank
rank page
6649 https://schedule.sxsw.com/
7938 https://www.bidorbuy.co.za/
18728 https://bluelink.hyundaiusa.com/
18728 https://bluelink.hyundaiusa.com/
21097 https://support.paloaltonetworks.com/
23725 http://biblioteca.unir.net/
26865 http://busqueda.gandhi.com.mx/
26865 https://www.gandhi.com.mx/
53653 https://www.benefitcosmetics.com/
59215 https://www.lakeland.co.uk/
95259 https://www.insidermedia.com/
102828 https://www.mindsumo.com/
104091 https://save1.freedomdebtrelief.com/
105003 https://www.agweb.com/
160573 https://imotors.com/
160573 https://www.imotors.com/
160836 https://www.guesttoguest.fr/
162222 https://flughafen-zuerich.ch/
162222 https://www.flughafen-zuerich.ch/
162779 https://www.guesttoguest.com/
179077 https://www.arcticcat.com/
179077 https://ca.www.arcticcat.com/
179077 https://fr.www.arcticcat.com/
179077 https://intl.www.arcticcat.com/
206434 https://www.zurich-airport.com/
215042 https://beta.everypost.me/
215042 https://web.everypost.me/
215496 https://www.sodetel.net.lb/
215533 https://www.theticketsellers.co.uk/
215533 https://ww2.theticketsellers.co.uk/
229393 https://www.machinerypete.com/
230081 https://www.guesttoguest.es/
231606 https://jobs.concern.net/
231606 https://www.concern.net/
239407 https://www.ecigplanete.com/
264539 https://matangitonga.to/
270109 https://www.kokatu.com/
294955 http://metro.com/
317252 http://wpengineer.com/
350572 https://www.stadtwerke-muenster.de/
382409 https://www.guesttoguest.it/
388068 https://www.agprofessional.com/
460549 http://preferences.farmjournal.com/
491734 https://www.lakeland.de/
572431 https://www.dairyherd.com/
641970 https://learning.aries.net/
967503 https://www.fightlife.com.au/
@zcorpan
Copy link
Author

zcorpan commented Oct 25, 2018

The data set is HTTP Archive, and I use BigQuery to run queries on the data. It's 1.3 million pages and their response bodies including CSS/JS subresources, and other data (e.g. which URLs trigger a particular Chrome use counter). You can set up an account (with payment to Google for usage over 1TB or some such) and do this yourself also. More info here:

https://httparchive.org/faq#how-do-i-use-bigquery-to-write-custom-queries-over-the-data
https://www.chromium.org/blink/platform-predictability/compat-tools
https://docs.google.com/document/d/1cpjWFoXBiuFYI4zb9I7wHs7uYZ0ntbOgLwH-mgqXdEM/edit#heading=h.1m1gg72jnnrt

I ran this query:

#standardSQL
SELECT
  Alexa_rank AS rank,
  r.page AS page
FROM
  `httparchive.response_bodies.2018_10_01_desktop` AS r
JOIN
  `httparchive.urls.20170315`
ON
  NET.REG_DOMAIN(r.page) = Alexa_domain
WHERE
  REGEXP_CONTAINS(r.body, r"(?i)-(webkit|moz)-appearance\s*:\s*menulist-textfield\b")
ORDER BY
  rank

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment