Skip to content

Instantly share code, notes, and snippets.

@migbash
Last active June 14, 2020 17:48
Show Gist options
  • Save migbash/ad1a3275f096b4fe4a0912578b202b92 to your computer and use it in GitHub Desktop.
Save migbash/ad1a3275f096b4fe4a0912578b202b92 to your computer and use it in GitHub Desktop.
Simple and quick boilerplate for python webScraping Projects using technologies such as: BeautifulSoup4, Selenium, & Scrapy.
[[source]]
name = "pypi"
url = "https://pypi.org/simple"
verify_ssl = true
[dev-packages]
[packages]
beautifulsoup4 = "*"
requests = "*"
[requires]
python_version = "3.7"
import requests
from bs4 import BeautifulSoup
# --- Scrape Target URL
# Desc: Data on the HTML
#
# Return: html data
# ---
def scrape(url):
r = requests.get(url)
if r.status_code == 200:
soup = BeautifulSoup(r.content, 'html.parser')
output = soup
print (output)
if __name__ == "__main__":
scrape("https://play.google.com/store/apps/category/HEALTH_AND_FITNESS?hl=en_US")

Good Snippets code Selenium BeutifulSoup Tutorial


Importing Selenium:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait

Starting up a browser:

If you want to open Chrome

driver = webdriver.Chrome()

If you want to open Firefox

driver = webdriver.Firefox()

# Simple Scraping Script for Website
# Getting started with scraping:
from bs4 import BeautifulSoup
import requests
url = raw_input("Enter a website to extract the URL's from: ")
r = requests.get("http://" +url)
data = r.text
soup = BeautifulSoup(data)
for link in soup.find_all('a'):
print(link.get('href'))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment