Skip to content

Instantly share code, notes, and snippets.

@blairanderson
Last active February 16, 2023 16:32
Show Gist options
  • Save blairanderson/dff0509010c091fe59325c84ef910457 to your computer and use it in GitHub Desktop.
Save blairanderson/dff0509010c091fe59325c84ef910457 to your computer and use it in GitHub Desktop.
default / generic website scraping
mkdir something
cd something
npm init

continue until created

npm install website-scraper
curl -L -o index.mjs https://gist.githubusercontent.com/blairanderson/dff0509010c091fe59325c84ef910457/raw/d9a3aa2f3775478d0bd20ba9f31cf6a760825075/index.mjs
node index.mjs
import scrape from 'website-scraper'; // only as ESM, no CommonJS
const options = {
urls: ['https://www.something.com'],
recursive: true,
directory: '/public',
prettifyUrls: true,
ignoreErrors: true,
requestConcurrency: 4,
subdirectories: [
{directory: 'img', extensions: ['.jpg', '.png', '.svg']},
{directory: 'js', extensions: ['.js']},
{directory: 'css', extensions: ['.css']}
]
};
scrape(options).then((result) => {console.log(result)})
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment