Skip to content

Instantly share code, notes, and snippets.

@agungjk
Last active April 7, 2024 15:22
Show Gist options
  • Star 25 You must be signed in to star a gist
  • Fork 5 You must be signed in to fork a gist
  • Save agungjk/ff542367470d156478f7381af2cf7e60 to your computer and use it in GitHub Desktop.
Save agungjk/ff542367470d156478f7381af2cf7e60 to your computer and use it in GitHub Desktop.
Crawler example on Vercel using Puppeteer and NextJS API routes
const puppeteer = require('puppeteer-core');
const cheerio = require('cheerio');
const chrome = require('chrome-aws-lambda');
export default async (req, res) => {
const slug = req?.query?.slug;
if (!slug) {
res.statusCode = 200
res.setHeader('Content-Type', 'application/json')
res.end(JSON.stringify({ id: null }))
return;
}
const browser = await puppeteer.launch(
process.env.NODE_ENV === 'production'
? {
args: chrome.args,
executablePath: await chrome.executablePath,
headless: chrome.headless,
}
: {}
);
const page = await browser.newPage();
page.setUserAgent('Opera/9.80 (J2ME/MIDP; Opera Mini/5.1.21214/28.2725; U; ru) Presto/2.8.119 Version/11.10');
await page.goto(`https://m.youtube.com/${slug}/videos`);
let content = await page.content();
var $ = cheerio.load(content);
$.prototype.exists = function (selector) {
return this.find(selector).length > 0;
}
let id = null;
const isLive = $('body').exists('[data-style="LIVE"]');
if (isLive) {
const url = $('ytm-compact-video-renderer .compact-media-item-image').attr('href');
const arr = url.split('?v=');
id = arr[1];
}
await browser.close();
res.statusCode = 200
res.setHeader('Content-Type', 'application/json')
res.end(JSON.stringify({ id }))
}
@btk
Copy link

btk commented Oct 14, 2022

As an addition to @Otoris 's comment.

If you are also using lighthouse with this combo, you still hit the wall, using these versions keeps me under 50mb;

    "chrome-aws-lambda": "^6.0.0",
    "lighthouse": "^6.1.1",
    "puppeteer-core": "^6.0.0"

@canbax
Copy link

canbax commented Nov 6, 2022

this doesn't work on my local setup. I see

        reject(new Error([
               ^

Error: Failed to launch the browser process! spawn /usr/bin/chromium-browser ENOENT

TROUBLESHOOTING: https://github.com/puppeteer/puppeteer/blob/main/docs/troubleshooting.md

at onClose (/Users/yusufcanbaz/Desktop/yusuf/namaz-vakti-api/node_modules/puppeteer-core/lib/cjs/puppeteer/node/BrowserRunner.js:193:20)
at ChildProcess.<anonymous> (/Users/yusufcanbaz/Desktop/yusuf/namaz-vakti-api/node_modules/puppeteer-core/lib/cjs/puppeteer/node/BrowserRunner.js:185:85)
at ChildProcess.emit (node:events:513:28)
at Process.ChildProcess._handle.onexit (node:internal/child_process:289:12)
at onErrorNT (node:internal/child_process:478:16)
at processTicksAndRejections (node:internal/process/task_queues:83:21)

[nodemon] app crashed - waiting for file changes before starting...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment