Skip to content

Instantly share code, notes, and snippets.

ScrapingBee ScrapingNinjaHQ

View GitHub Profile
View Marketing-material.md

Company history

ScrapingBee is a web scraping API that handle headless browsers are rotate proxies for you.

It was funded by Pierre de Wulf and Kevin Sahin in 2019. Pierre and Kevin worked on Web Scraping projects for many years, for startups and bigger companies. We realized we always encountered the same problems. We decided to create an API to solve these problems.

We are an open startup, meaning we take transparency seriously, you can learn more about us here: https://www.indiehackers.com/product/scrapingninja

View conclusion.md
Name socket urllib3 requests Scrapy selenium
Ease of use - - - + + + + + + + +
Flexibility + + + + + + + + + + + + + +
Speed of execution + + + + + + + + + + +
Common use case -Writing low-level programming interface -High level application that needs fine control over HTTP (pip, aws client, requests, streming) -Calling an API
-Simple application (in terms of HTTP needs)
-Crawling a important list of website
- Filter, extract and load on scrapped data
-JS rendering
-Scraping SPA
-Automated testing
-Programmatic screenshot
Learn more - Official documentation
- Great tutorial 👍
- Official documentation
- [PIP usage of urllib3](htt
View gist.Http_request
GET /product/ HTTP/1.1
Host: example.com
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/web\
p,*/*;q=0.8
Accept-Encoding: gzip, deflate, sdch, br
Connection: keep-alive
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit\
/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36
View webdriverwait.py
try:
elem = WebDriverWait(driver, delay).until(EC.presence_of_element_located((By.NAME, 'chart')))
print("Page is ready!")
except TimeoutException:
print("Timeout")
View chrome.py
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.headless = True
options.add_argument("--window-size=1920,1200")
driver = webdriver.Chrome(options=options, executable_path=r'/usr/local/bin/chromedriver')
driver.get("https://www.nintendo.com/")
driver.save_screenshot('screenshot.png')
driver.quit()
You can’t perform that action at this time.