Skip to content

Instantly share code, notes, and snippets.

@tracylemke
Last active December 31, 2019 09:06
Show Gist options
  • Save tracylemke/3fb7aa4abcd11682ea88b65e956cdfc5 to your computer and use it in GitHub Desktop.
Save tracylemke/3fb7aa4abcd11682ea88b65e956cdfc5 to your computer and use it in GitHub Desktop.
Web Scraper in Python
from lxml import html
import requests
# 1. Scrape list of medical conditions
page = requests.get('https://www.nhsinform.scot/illnesses-and-conditions/a-to-z')
tree = html.fromstring(page.content)
# Scrape from this content and strip off spaces, tabs, and line breaks
# <h2 class="module__title">
# Abdominal aortic aneurysm
# </h2>
illnesses = tree.xpath('//h2[@class="module__title"]/text()')
for e in illnesses:
print(e.replace("\r\n\t", "").replace("\t", ""))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment