Skip to content

Instantly share code, notes, and snippets.

@mnmkng
Created May 20, 2018 21:24
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save mnmkng/0299d11a31f9c2dc4c2ef7274250794f to your computer and use it in GitHub Desktop.
Save mnmkng/0299d11a31f9c2dc4c2ef7274250794f to your computer and use it in GitHub Desktop.
const Apify = require('apify');
Apify.main(async () => {
// Get queue and enqueue first url.
const requestQueue = await Apify.openRequestQueue();
const enqueueUrl = async url => requestQueue.addRequest(new Apify.Request({ url }));
await enqueueUrl('https://news.ycombinator.com/');
const crawlerConfig = {
launchPuppeteerOptions: {
liveView: true
},
requestQueue,
disableProxy: true,
// This page is executed for each request.
// If request failes then it's retried 3 times.
// Parameter page is Puppeteers page object with loaded page.
handlePageFunction: async ({ page, request }) => {
console.log(`Request ${request.url} succeeded!`);
// Extract all posts.
const data = await page.$$eval('.athing', (els) => {
return els.map(el => el.innerText);
});
// Save data.
await Apify.pushData({
url: request.url,
data,
});
// Enqueue next page.
try {
const nextHref = await page.$eval('.morelink', el => el.href);
await enqueueUrl(nextHref);
} catch (err) {
console.log(`Url ${request.url} is the last page!`);
}
},
// If request failed 4 times then this function is executed.
handleFailedRequestFunction: async ({ request }) => {
console.log(`Request ${request.url} failed 4 times`);
await Apify.pushData({
url: request.url,
errors: request.errorMessages,
})
},
}
// Create crawler.
const crawler = new Apify.PuppeteerCrawler(crawlerConfig);
secondAct();
// Run crawler.
await crawler.run();
});
async function secondAct() {
// Start browser.
const browser = await Apify.launchPuppeteer({liveView: true});
// Load http://goldengatebridge75.org/news/webcam.html and get an iframe
// containing webcam stream.
const page = await browser.newPage();
await page.goto('http://goldengatebridge75.org/news/webcam.html');
// Get a screenshot of that image.
const imageBuffer = await page.screenshot();
// Save it as an output.
await Apify.setValue('OUTPUT', imageBuffer, { contentType: 'image/jpeg' });
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment