Skip to content

Instantly share code, notes, and snippets.

@domtronn
Last active March 20, 2020 11:44
Show Gist options
  • Save domtronn/d38d60c5a84b372cdd3c22ecec983995 to your computer and use it in GitHub Desktop.
Save domtronn/d38d60c5a84b372cdd3c22ecec983995 to your computer and use it in GitHub Desktop.
Generate representative URLs of your IA from a sitemap
/*
* You will need to run
* npm install xml2json node-fetch
*
* Run this script using
* SITEMAP_DOMAIN=... SITEMAP_PATH=... node index.js
*/
var fetch = require('node-fetch')
var p = require('xml2json')
var fs = require('fs')
var url = require('url')
;(async () => {
const data = await fetch(`${process.env.SITEMAP_DOMAIN}${process.env.SITEMAP_PATH}`)
const xml = await data.text()
const json = JSON.parse(p.toJson(xml))
const urls = json
.urlset
.url
.map(({ loc }) => new url.URL(loc).pathname)
.sort((a, b) => a < b ? -1 : 1)
const meta = urls.map((it => {
return {
path: it,
pfx: it.split('/').slice(0,-2).join('/'),
len: it.split('/').filter(i=>i).length
}
}))
const grouped =
meta.reduce((acc, { path, pfx, len }) => ({
...acc,
[pfx]: [ ...(acc[pfx] || []), path ]
}), [])
const totest =
Object
.entries(grouped)
.reduce((acc, [ pfx, paths ]) => urls.includes(`${pfx}/`)
? [ ...acc, `${pfx}/`, paths[0] ]
: [ ...acc, paths[0] ] ,[])
console.log(
totest
.map(i=>`${process.env.SITEMAP_DOMAIN}${i}`)
.join('\n')
)
})()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment