Skip to content

Instantly share code, notes, and snippets.

@florean
Created September 10, 2016 21:23
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save florean/1ceb74e1bd5be8c7b25ff9ac47156e1d to your computer and use it in GitHub Desktop.
Save florean/1ceb74e1bd5be8c7b25ff9ac47156e1d to your computer and use it in GitHub Desktop.
A simple example of scraping a WA Department of Licensing web page and pulling out the wait times.
from lxml import html
import requests
import re
# URL for DMV office page.
DMV_URL = "https://fortress.wa.gov/dol/dolprod/dsdoffices/OfficeInfo.aspx?cid=45&oid=23"
# Get the page content.
page = requests.get(DMV_URL)
# Parse the HTML and create a tree.
tree = html.fromstring(page.content)
# Get the wait times.
wait_times = tree.xpath('//*[@id="ctl00_Main_waittime"]/text()')
# Get just the time rows.
wait_times = [x for x in wait_times if x.strip()][1:]
# Regex out the hours and minutes
time_regex = re.compile("(\d+).*?(\d+).*")
# Find the wait time
hours, minutes = time_regex.match(wait_times[0]).groups()
# Convert to integers and normalize to minutes.
minutes = int(minutes) + int(hours) * 60
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment