Skip to content

Instantly share code, notes, and snippets.

@kizernis
Created September 20, 2018 07:16
Show Gist options
  • Save kizernis/1d71d98856139a2f4fdef82a2fb93e8f to your computer and use it in GitHub Desktop.
Save kizernis/1d71d98856139a2f4fdef82a2fb93e8f to your computer and use it in GitHub Desktop.
import requests
from bs4 import BeautifulSoup
import time
import random
url = 'https://www.avvo.com/search/lawyer_search?utf8=%E2%9C%93&q=Criminal+defense&loc=Bethesda%2C+MD&page={page_num}&sort=client_rating'
user_agent = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36'
start_page = 1
end_page = 67
for i in range(start_page, end_page + 1):
html = requests.get(url.format(page_num=i), headers={'User-Agent': user_agent}).text
for tag in BeautifulSoup(html, 'lxml').find_all('strong', class_='u-vertical-margin-0'):
print(tag.get_text())
if i < end_page:
time.sleep(random.randrange(5, 11))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment