Skip to content

Instantly share code, notes, and snippets.

@mnmkng
Created November 14, 2022 16:27
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mnmkng/e0b01cd417278b53b330069d21b9aa1b to your computer and use it in GitHub Desktop.
Save mnmkng/e0b01cd417278b53b330069d21b9aa1b to your computer and use it in GitHub Desktop.
import { PlaywrightCrawler, Dataset } from 'crawlee';
const maxRepoCount = 100;
const crawler = new PlaywrightCrawler({
requestHandler: async ({ page, infiniteScroll }) => {
// Click the Load more button and scroll until
// `maxRepoCount` repositories are found.
console.log('Clicking and scrolling.');
await infiniteScroll({
buttonSelector: 'text=Load more',
stopScrollCallback: async () => {
const repoCount = (await page.$$('article.border')).length;
return repoCount >= maxRepoCount;
},
});
// Extract data from the page. Selecting all 'article' elements
// will return all the repository cards we're looking for.
const repos = await page.$$eval('article.border', (repoCards) => {
return repoCards.map(card => {
const [user, repo] = card.querySelectorAll('h3 a');
const stars = card.querySelector('#repo-stars-counter-star').getAttribute('title');
const description = card.querySelector('div.px-3 > p + div');
const topics = card.querySelectorAll('a.topic-tag');
const toText = (element) => element && element.innerText.trim();
const parseNumber = (text) => Number(text.replace(/,/g, ''));
return {
user: toText(user),
repo: toText(repo),
url: repo.href,
stars: parseNumber(stars),
description: toText(description),
topics: Array.from(topics).map((t) => toText(t)),
};
});
});
// Save the results to disk.
console.log(`We extracted ${repos.length} repositories.`);
await Dataset.pushData(repos);
await Dataset.exportToCSV('repositories');
}
})
await crawler.run(['https://github.com/topics/javascript']);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment