Skip to content

Instantly share code, notes, and snippets.

@jroakes
jroakes / seoml.md
Last active June 29, 2023 09:04
ML Repository for SEO

Machine Learning Repository for SEO

SEO is a field that is rich with data, yet many young SEOs may not be equipped to learn tools that will prepare them for the future. We want to support our community by using our expertise to provide access to more advanced tools that will allow SEOs of all levels to play with the technologies that will shape the future of our work.

Objectives

  • Provide a repositiory that makes it possible to learn about ML specifically targeted to those interested in SEO
  • Provide a repository that allows a novice user to run a simple model on something meaningful for SEO.
  • Provide a repository that allows advanced users to save time on data getting, cleaning, preprocessing, and model selection.
  • Allow users to showcase work and models developed.
  • Have users get involved with the future development of the repo.
@jroakes
jroakes / wpt_iframe_parsing.js
Created May 12, 2022 22:18
WebPageTest Iframe Parsing
// This uses webpagetest data to extract response bodies of type document.
// I am conflating here that a Document response from a loaded webpage is a call from an iFrame.
var _wptBodies = [];
try {
// $WPT_BODIES is from webpagetest. It contains a lot of info on the request including request bodies.
_wptBodies = $WPT_BODIES;
}
// Validation URL:
// https://search.google.com/test/rich-results/result/r%2Fbreadcrumbs?id=0bMAj7v_kU2bqpyFWmozkA
{
"@context": "http://schema.org",
"@graph": [
{
"@type": "Organization",
"@id": "https://epicstream.com/#organization",
"name": "Epicstream",
<script type='application/ld+json'>
{
"@context": "http://schema.org/",
"@type": "WebPage",
"mainContentOfPage": {
"@type": "WebPageElement",
"headline": "Potential Backcharge Notice - Construction Forms for Contractors Mobile App",
"keywords": "back charge notice form template, back charge notice forms",
"text": "https://www.gocanvas.com/mobile-forms-apps/2295-Potential-Backcharge-Notice-Construction-Forms-for-Contractors",
"description": ""
{
"matched_disallows": {
"Googlebot": [
"/authwall",
"/learning*?auth=true",
"/learning*&auth=true",
"/learning/login*?redirect=",
"/learning/login*&redirect=",
"/oauth/v2/*",
"/oauth2/v2/*",
{
"size": 7277,
"size_kib": 7.1064453125,
"over_google_limit": false,
"record_counts": {
"comment_count": 4,
"sitemap_count": 1,
"by_type": {
"user_agent": 5,
"allow": 54,
{
"redirected": false,
"status": 200,
"size": 812,
"comment_lines": 0,
"allow_lines": 1,
"disallow_lines": 22,
"user_agents": [
"*",
"MJ12bot",
{
"anchors": {
"rendered": {
"crawlable": {
"follow": 174,
"nofollow": 2
},
"hash_link": 0,
"hash_only_link": 0,
"javascript_void_links": 1,
{
"01.12": 0,
"01.13": 0,
"link-nodes": {
"total": 11,
"nodes": [
{
"tagName": "link",
"rel": "canonical",
"href": "http://www.thingthingkids.com"
@jroakes
jroakes / wpt_lighthouse_data_mobile_archive.json
Created June 1, 2021 12:30
Web Page Test raw Lighthouse data for Internet Almanac
{
"userAgent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36",
"environment": {
"networkUserAgent": "Mozilla/5.0 (Linux; Android 7.0; Moto G (4)) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4420.0 Mobile Safari/537.36 Chrome-Lighthouse",
"hostUserAgent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36",
"benchmarkIndex": 1038.5,
"credits": {
"axe-core": "4.1.3"
}
},