Skip to content

Instantly share code, notes, and snippets.

@arturosalgado
Created September 12, 2021 13:59
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save arturosalgado/c83c46fc7a3b6e98c6c718170229dc57 to your computer and use it in GitHub Desktop.
Save arturosalgado/c83c46fc7a3b6e98c6c718170229dc57 to your computer and use it in GitHub Desktop.
Scrap the web, works on sites with javascript-created dom items. Ubuntu Linux version.
sudo pip3 install requests
>sudo: pip3: command not found
sudo apt install python3-pip
pip3 install requests-html.
>pyppeteer.errors.BrowserError: Browser closed unexpectedly:
sudo apt install -y gconf-service libasound2 libatk1.0-0 libc6 libcairo2 libcups2 libdbus-1-3 libexpat1 libfontconfig1 libgcc1 libgconf-2-4 libgdk-pixbuf2.0-0 libglib2.0-0 libgtk-3-0 libnspr4 libpango-1.0-0 libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 libxcomposite1 libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 libxrandr2 libxrender1 libxss1 libxtst6 ca-certificates fonts-liberation libappindicator1 libnss3 lsb-release xdg-utils wget
python.py
from requests_html import HTMLSession
session = HTMLSession();
URL ='url-which-creates-content-dynamically-with-js.com'
r = session.get(URL)
r.html.render(sleep=2,keep_page= True,scrolldown=1)
items = r.html.find('span.class')
for item in items:
print(item.text)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment