Skip to content

Instantly share code, notes, and snippets.

@TheVoxcraft
Created February 1, 2016 23:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save TheVoxcraft/6f5485c04666b101c4e1 to your computer and use it in GitHub Desktop.
Save TheVoxcraft/6f5485c04666b101c4e1 to your computer and use it in GitHub Desktop.
Web Crawler. NEEDS bs4 installed!
import requests
from bs4 import BeautifulSoup #NEEDS bs4 installed!
def trade_spider(max_page):
page = 1
while page <= max_page:
url = "http://www.finn.no/finn/realestate/homes/result?location=0%2F20061&page=" + str(page)
source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text)
for link in soup.findAll('a', {'data-fth-event': 'searchclickthrough'}):
href = link.get('href')
title = link.string # just the text
if title != "None": #Problem
print(" ")
print(title)
print(href)
page += 1
trade_spider(30)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment