Skip to content

Instantly share code, notes, and snippets.

@jroakes
jroakes / seoml.md
Last active June 29, 2023 09:04
ML Repository for SEO

Machine Learning Repository for SEO

SEO is a field that is rich with data, yet many young SEOs may not be equipped to learn tools that will prepare them for the future. We want to support our community by using our expertise to provide access to more advanced tools that will allow SEOs of all levels to play with the technologies that will shape the future of our work.

Objectives

  • Provide a repositiory that makes it possible to learn about ML specifically targeted to those interested in SEO
  • Provide a repository that allows a novice user to run a simple model on something meaningful for SEO.
  • Provide a repository that allows advanced users to save time on data getting, cleaning, preprocessing, and model selection.
  • Allow users to showcase work and models developed.
  • Have users get involved with the future development of the repo.
const chromium = require('chrome-aws-lambda');
const puppeteer = require('puppeteer-core');
const extractor = require('unfluff');
const summarize = require('summarize');
exports.handler = async (event, context, callback) => {
console.log('Received event:', JSON.stringify(event, null, 2));
let uP,mT,lT;const pf=document.createElement('link'),isSupported=pf.relList&&pf.relList.supports&&pf.relList.supports('prefetch'),allowQueryString='instantAllowQueryString'in document.body.dataset;if(isSupported){pf.rel='prefetch',document.head.appendChild(pf);const a={capture:!0,passive:!0};document.addEventListener('touchstart',tL,a),document.addEventListener('mouseover',mouseoverListener,a)}function tL(a){lT=performance.now();const b=a.target.closest('a');b&&iP(b)&&(b.addEventListener('touchcancel',tecL,{passive:!0}),b.addEventListener('touchend',tecL,{passive:!0}),uP=b.href,preload(b.href))}function tecL(){uP=void 0,sP()}function mouseoverListener(a){if(!(1100>performance.now()-lT)){const b=a.target.closest('a');b&&iP(b)&&(b.addEventListener('mouseout',mL,{passive:!0}),uP=b.href,mT=setTimeout(()=>{preload(b.href),mT=void 0},65))}}function mL(a){a.relatedTarget&&a.target.closest('a')==a.relatedTarget.closest('a')||(mT?(clearTimeout(mT),mT=void 0):(uP=void 0,sP()))}function iP(a){if(uP!=a.href){const b=new
var items = []
console.log('\n\n Starting:')
function globalSearch(obj, value, maxrec, rec = 0 ) {
var excl = ['document','window','top']
if (rec <= maxrec ){
Browser Built
Navigating to: https://locomotive.agency/
{
"title": [
"LOCOMOTIVE\u00ae - Enterprise Technical SEO Agency"
],
"description": [
"LOCOMOTIVE\u00ae - 2019 U.S. Search Awards \"Best SEO Agency\". We are an agency team of enterprise technical, and on-page SEO specialists: Moving you forward."
],
%.zip:
mkdir -p nodejs/node_modules/
cd nodejs/ && npm install summarize cheerio natural unfluff --no-bin-links --no-optional --no-package-lock --no-save --no-shrinkwrap && cd -
mkdir -p $(dir $@)
zip -9 --filesync --move --recurse-paths $@ nodejs/
<script type="application/ld+json">
{
"@context": "https://schema.org/",
"@type": "Product",
"name": "JR Oakes Contact Information",
"image": [
"https://datanyze.com.com/images/jr_oakes_12345678.jpg"
],
"description": "JR Oakes is the Director of Technical SEO at Locomotive Agency. Their email uses the locomotive.agency domain",
"sku": "12345678",
@jroakes
jroakes / Remove Stop Words from Google Data Studio Field
Last active March 22, 2021 16:48
Use NLTK Stopwords List to group similar text fields in Google Data Studio
REGEXP_REPLACE(REGEXP_REPLACE(REGEXP_REPLACE(LOWER(<field_to_convert>), "[^\\w]+", " "), "(\\ |^)(me|my|the|of|myself|we|our|ours|ourselves|you|your|yours|yourself|yourselves|he|him|his|himself|she|her|hers|herself|it|its|itself|they|them|their|theirs|themselves|what|which|who|whom|this|that|these|those|am|is|are|was|were|be|been|being|have|has|had|having|do|does|did|doing|a|an|the|and|but|if|or|because|as|until|while|of|at|by|for|with|about|against|between|into|through|during|before|after|above|below|to|from|up|down|in|out|on|off|over|under|again|further|then|once|here|there|when|where|why|how|all|any|both|each|few|more|most|other|some|such|no|nor|not|only|own|same|so|than|too|very|can|will|just|don|should|now|i|t|s)(\\ |$)", " "), "[\\s]{2,}", " ")
@jroakes
jroakes / wpt_page_data_archive.json
Created June 1, 2021 12:00
Web Page Test raw page data for Internet Almanac
{
"startedDateTime": "2021-05-03T10:41:23.495+00:00",
"title": "Run 1, First View for http://www.thingthingkids.com/",
"id": "page_1_0_1",
"testID": "210503_Mx5Q_B788",
"pageTimings": {
"onLoad": 22473,
"onContentLoad": -1,
"_startRender": 5900
},
@jroakes
jroakes / wpt_request_data_mobile_archive.json
Created June 1, 2021 12:27
Web Page Test raw request data for Internet Almanac
{
"pageref": "page_1_0_1",
"startedDateTime": "2021-05-01T18:30:17.399+00:00",
"time": 3080,
"_run": 1,
"_cached": 0,
"request": {
"method": "GET",
"url": "https://emin.vn/",
"headersSize": 538,