Skip to content

Instantly share code, notes, and snippets.

@har07
Created April 28, 2015 11:56
Show Gist options
  • Save har07/0c1d4302e2cf7db759a7 to your computer and use it in GitHub Desktop.
Save har07/0c1d4302e2cf7db759a7 to your computer and use it in GitHub Desktop.
import requests
from bs4 import BeautifulSoup
def tru_crawler(max_pages):
p = '&page='
page = 1
while page <= max_pages:
url = 'http://www.therapy-directory.org.uk/search.php?search=Sheffield&distance=40&services[23]=on&services=23&business_type[individual]=on&uqs=626693' + p + str(page)
code = requests.get(url)
text = code.text
soup = BeautifulSoup(text)
for link in soup.select('div.member-summary h2 a'):
href = 'http://www.therapy-directory.org.uk' + link.get('href')
print(href)
page += 1
print page
tru_crawler(3)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment