Skip to content

Instantly share code, notes, and snippets.

@DiogenesAnalytics
Forked from RagingTiger/sample.py
Created November 24, 2023 10:26
Show Gist options
  • Save DiogenesAnalytics/f09797ba17b1b7676367996eb35ebae7 to your computer and use it in GitHub Desktop.
Save DiogenesAnalytics/f09797ba17b1b7676367996eb35ebae7 to your computer and use it in GitHub Desktop.
Web Scraping using Bright Data Scraping Browser and Playwright
import re
import asyncio
from playwright.async_api import async_playwright
USERNAME = "TYPE YOUR USERNAME HERE"
PASSWORD = "TYPE YOUR PASSWORD HERE"
HOST = "zproxy.lum-superproxy.io:9222"
URL = "https://www.svpino.com/" # USE YOUR URL HERE
def process(html):
regex = re.compile("<title>(.*?)</title>", re.IGNORECASE | re.DOTALL)
title = regex.search(html).group(1)
print(f"Title: {title}")
async def main():
browser_url = f"https://{USERNAME}:{PASSWORD}@{HOST}"
async with async_playwright() as pw:
print("Connecting to browser...")
browser = await pw.chromium.connect_over_cdp(browser_url)
page = await browser.new_page()
print(f"Navigating to URL {URL}...")
await page.goto(URL, timeout=120000)
process(await page.evaluate("()=>document.documentElement.outerHTML"))
await browser.close()
asyncio.run(main())
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment