Skip to content

Instantly share code, notes, and snippets.

@arildm
Created June 12, 2019 07:44
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save arildm/e5a4db15f8cf05ccec43f81101c493eb to your computer and use it in GitHub Desktop.
Save arildm/e5a4db15f8cf05ccec43f81101c493eb to your computer and use it in GitHub Desktop.
Scrapes svenskaplatser.se and outputs a headerless csv with {city},{streetname}
import requests
from lxml.cssselect import CSSSelector
from lxml.html import fromstring
SITE = 'https://www.svenskaplatser.se/'
def links(url, selector):
html = requests.get(SITE + url).content.decode()
tree = fromstring(html)
sel = CSSSelector(selector)
return ((el.text, el.attrib['href']) for el in sel(tree))
for city, url in links('', '#main p:not(#intro) a'):
for street, url in links(url, '#main li a'):
print('%s,%s' % (city, street))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment