Skip to content

Instantly share code, notes, and snippets.

@jancurn
Last active February 29, 2024 07:26
Show Gist options
  • Star 8 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save jancurn/05ed1ce414ca1f8f5a2e5f1eb4b1a2c6 to your computer and use it in GitHub Desktop.
Save jancurn/05ed1ce414ca1f8f5a2e5f1eb4b1a2c6 to your computer and use it in GitHub Desktop.
Example showing how to use the proxy-chain NPM package to let headless Chrome use a proxy server with username and password
const puppeteer = require('puppeteer');
const proxyChain = require('proxy-chain');
(async() => {
const oldProxyUrl = 'http://bob:password123@proxy.example.com:8000';
const newProxyUrl = await proxyChain.anonymizeProxy(oldProxyUrl);
// Prints something like "http://127.0.0.1:45678"
console.log(newProxyUrl);
const browser = await puppeteer.launch({
args: [`--proxy-server=${newProxyUrl}`],
});
// Do your magic here...
const page = await browser.newPage();
await page.goto('https://www.example.com');
await page.screenshot({ path: 'example.png' });
await browser.close();
// Clean up, forcibly close all pending connections
await proxyChain.closeAnonymizedProxy(newProxyUrl, true);
})();
@dimonchoo
Copy link

thanks! its help me!

@TahorSuiJuris
Copy link

Please, would you have any insight into rotating proxies per page?

I am trying to modify the below code:

const { Cluster } = require('puppeteer-cluster');
const proxyChain = require('proxy-chain');

//const request = require('request-promise');


(async () => {
    const cluster = await Cluster.launch({
    concurrency: Cluster.CONCURRENCY_CONTEXT,
    maxConcurrency: 2,
});

await cluster.task(async ({ page, data: url }) => {

//==================================
/*    const proxies = {
        'useragent1': 'http://proxyusername1:proxypassword1@proxyhost1:proxyport1',
        'useragent2': 'http://proxyusername2:proxypassword2@proxyhost2:proxyport2',
        'useragent3': 'http://proxyusername3:proxypassword3@proxyhost3:proxyport3',
    };

const server = new ProxyChain.Server({
        port: 8000,
        prepareRequestFunction: ({request}) => {
        const userAgent = request.headers['user-agent'];
const proxy = proxies[userAgent];
return {
    upstreamProxyUrl: proxy,
};
});
});

server.listen(() => console.log('proxy server started'));*/
//==================================

    await page.goto(url);

var currentdate = new Date();
var datetime = "_" + (currentdate.getMonth()+1)  + "/"
    + currentdate.getDate() + "/"
    + currentdate.getFullYear() + " @ "
    + currentdate.getHours() + ":"
    + currentdate.getMinutes() + ":"
    + currentdate.getSeconds();
let fileName = (`${datetime}`).replace(/(\. |\&|\.\r|\, |\  |\ |\-|\,|\r\n|\n|\r|\.|\/|:|%|#)/gm, "_");
if (fileName.length > 100) {
    fileName = fileName.substring(0, 100);
}
const url2 = page.url();

const screen = `${fileName}` + '_' + url2.replace(/[^a-zA-Z]/g, '_') + '.png';//☑ added timestamp
await page.screenshot({ path: './screenshots/' + screen });//size is 800x600
console.log(`Screenshot of: ${url2} saved: ${screen}`);
});

cluster.queue('http://httpbin.org/ip');
cluster.queue('http://www.google.com/');
cluster.queue('http://httpbin.org/ip');
cluster.queue('http://www.wikipedia.org/');
cluster.queue('http://httpbin.org/ip');

await cluster.idle();
await cluster.close();
})();

@chrisfranko
Copy link

So there is this https://github.com/gajus/puppeteer-proxy

It listens to nodejs and intercepts the communication and changes the proxy. It seems like the per page limitation is a chrome one.

@jancurn
Copy link
Author

jancurn commented Mar 2, 2020

To intercept HTTPS communication, you'd need to use a man-in-the-middle proxy with a custom self-signed certificate, which quite painful to setup and insecure. To support proxy IP address rotation in PuppeteerCrawler in Apify SDK, we simply start new Chromium browser instances with new proxy settings. BTW this is implemented out-of-the-box in Apify's Web Scraper (apify/web-scraper).

@gsouf
Copy link

gsouf commented Apr 14, 2020

Hi @jancurn will the proxy server close by itself?

@jancurn
Copy link
Author

jancurn commented Apr 15, 2020

@gsouf Unfortunately not, you need to call closeAnonymizedProxy. I'll update the Gist

@aditodkar
Copy link

aditodkar commented Jan 9, 2023

@jancurn I am trying to run puppeteer with proxy chain package on aws lambda but I am getting this error message:

"errorType": "Error",
  "errorMessage": "Protocol error (Target.createTarget): Target closed.",

code:

const chromium = require('chrome-aws-lambda');
const { addExtra } = require("puppeteer-extra");
const puppeteerExtra = addExtra(chromium.puppeteer);
const proxyChain = require('proxy-chain');

const StealthPlugin = require("puppeteer-extra-plugin-stealth");
puppeteerExtra.use(StealthPlugin());

exports.handler = async (event, context, callback) => {
    let finalResult = [];
    const url = ``;
    let browser;
    const oldProxyUrl = ''; // --> bright data proxy
    const newProxyUrl = await proxyChain.anonymizeProxy(oldProxyUrl);

    console.log("newProxyUrl", newProxyUrl)

    try {
        browser = await puppeteerExtra.launch({
            args: ['--no-sandbox', '--disable-setuid-sandbox', `--proxy-server=${newProxyUrl}`],
            defaultViewport: chromium.defaultViewport,
            executablePath: await chromium.executablePath,
            headless: chromium.headless
        });

        const page = await browser.newPage();

        await page.goto(url);

        finalResult = await extractElements(page);

    } catch (error) {
        return callback(error);
    } finally {
        await browser.close();
    }

    return callback(null, finalResult);
};

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment