Skip to content

Instantly share code, notes, and snippets.

@keplersj
Created February 8, 2018 18:16
Show Gist options
  • Save keplersj/3225186bfdebb9614674e18667a04a06 to your computer and use it in GitHub Desktop.
Save keplersj/3225186bfdebb9614674e18667a04a06 to your computer and use it in GitHub Desktop.
Sitemap Archiver
import * as got from "got";
import { parseString as parseXML } from "xml2js";
import * as pify from "pify";
import * as puppeteer from "puppeteer";
const sitemapLocation = "";
async function run() {
console.log("Requesting Sitemap");
const sitemapResponse = await got(sitemapLocation);
const sitemap: {
urlset: {
url: Array<{ loc: [string] }>;
};
} = await pify(parseXML)(sitemapResponse.body);
const browser = await puppeteer.launch({
headless: false
});
for (const url of sitemap.urlset.url) {
const [location] = url.loc;
console.log(`Archiving ${location}`);
const page = await browser.newPage();
await page.goto(`https://web.archive.org/save/${location}`, {
waitUntil: "networkidle2"
});
await page.close();
console.log(`Archived ${location}`);
}
await browser.close();
}
try {
run();
} catch (e) {
console.error("A Bad Happened!");
console.error(e);
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment