Skip to content

Instantly share code, notes, and snippets.

@AnderRV

AnderRV/index.js Secret

Last active August 31, 2021 10:31
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save AnderRV/d10670ffe87b4de442493fcf7dfd96a1 to your computer and use it in GitHub Desktop.
Save AnderRV/d10670ffe87b4de442493fcf7dfd96a1 to your computer and use it in GitHub Desktop.
const crawl = async url => {
visited.add(url);
const { data } = await axios.get(url);
const $ = cheerio.load(data);
const content = extractContent($);
const links = extractLinks($);
links
.filter(link => !visited.has(link)) // Filter out already visited links
.forEach(link => toVisit.add(link));
};
(async () => { // IIFE
// Loop over a set's values
for (const next of toVisit.values()) {
if (visited.size >= maxVisits) {
break;
}
toVisit.delete(next);
await crawl(next);
}
console.log(visited);
// Set { 'https://scrapeme.live/shop/page/1/', '.../2/', ... }
console.log(toVisit);
// Set { 'https://scrapeme.live/shop/page/47/', '.../48/', ... }
})(); // The final set of parenthesis will call the function
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment