Skip to content

Instantly share code, notes, and snippets.

@titipata
Last active September 6, 2017 04:49
Show Gist options
  • Save titipata/45387adfb47647dc0c7a307560df6f2f to your computer and use it in GitHub Desktop.
Save titipata/45387adfb47647dc0c7a307560df6f2f to your computer and use it in GitHub Desktop.

Script for scraping train arrival time

Get list of date from today until 250 days before.

import requests
import time
import datetime

numdays = 250
today = datetime.datetime.today()
date_list = [today - datetime.timedelta(days=x) for x in range(0, numdays)]
date_list = [dt.strftime('%d/%m/%Y') for dt in date_list]

Using requests, BeautifulSoup to scrape the data.

all_trains = []
for i, date in enumerate(date_list):
    r = requests.post(url, data={'date': date})
    html = r.text
    soup = BeautifulSoup(html, 'html.parser')
    body = soup.find('tbody')
    trs = body.find_all('tr')
    trains = []
    for tr in trs:
        trains.append([td.text.strip() for td in tr.find_all('td')])
    all_trains.append(trains)
    time.sleep(2)
    if i % 20 == 0:
        print('finish %d pages' % i)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment