Skip to content

Instantly share code, notes, and snippets.

@camillebaldock
Last active February 4, 2017 13:43
Show Gist options
  • Star 10 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save camillebaldock/1ecbb184f1f1f112419f to your computer and use it in GitHub Desktop.
Save camillebaldock/1ecbb184f1f1f112419f to your computer and use it in GitHub Desktop.
Oyster journey history scraping script
require 'rubygems'
require 'capybara'
require 'capybara/dsl'
require 'capybara/poltergeist'
require 'awesome_print'
Capybara.run_server = false
Capybara.current_driver = :poltergeist
class Oyster
include Capybara::DSL
def get_results
p "Enter username"
username = gets.chomp
p "Enter password"
password = gets.chomp
p "Enter Card number"
card_number = gets.chomp
p "Start date dd/mm/yyyy"
start_date = gets.chomp
p "End date dd/mm/yyyy"
end_date = gets.chomp
#Log in
visit "https://oyster.tfl.gov.uk/oyster/entry.do"
fill_in('UserName', :with => username)
fill_in('Password', :with => password)
click_button('Sign in')
sleep 5
#Select Oyster card number
select(card_number, :from => 'cardId')
click_button('Go')
sleep 5
click_link 'Journey history'
sleep 10
#Select date range
page.execute_script("$('.hidden-range').fadeIn();
$('#date-range').val('custom date range');
$('#date-range-button').hide().delay('200').fadeIn();
$('#from').val('#{start_date}');
$('#to').val('#{end_date}');")
click_button('date-range-button')
sleep 10
@scraped_journeys = {}
pagination = all('.pagination')
if pagination.count == 1
#The journeys are displayed on several pages
page_links = get_page_links
number_pages = page_links.count
page_number = 1
scrape_journeys_from_page
while(page_number < number_pages)
go_to_next_page(page_number)
page_number +=1
sleep 10
scrape_journeys_from_page
end
else
#The journeys are only displayed on one page
scrape_journeys_from_page
end
ap @scraped_journeys
end
def get_page_links
pagination = all('.pagination')
pagination.first.all('a')
end
def go_to_next_page(page_number)
page_links = get_page_links
if page_number == 1
page_links[page_number-1].click
else
page_links[page_number].click
end
end
def scrape_journeys_from_page
all('.journeyhistory').each do |table|
date = nil
table.all('tr').each do |row|
columns = row.all('td')
if row[:class] == "reveal-table-row"
#Tube or train journey
add_to_scraped_journeys(date.to_s, columns[0].text, columns[1].text)
else
if columns.size == 2
#Date line
date = Date.parse(columns[0].text)
end
if columns.size == 4
#Bus journey
add_to_scraped_journeys(date.to_s, columns[0].text, columns[1].text)
end
end
end
end
end
def add_to_scraped_journeys(date, time, description)
if @scraped_journeys[date] == nil
@scraped_journeys[date] = []
end
@scraped_journeys[date] << {
:hour => time,
:description => description,
}
end
end
Oyster.new.get_results
@camillebaldock
Copy link
Author

Limitations

My personal journey history only contains train, underground and bus journeys. Other modes of travel might be displayed in another way in journey histories: feel free to comment if you notice any errors with other types of journey.

This script assumes that your Oyster account has several cards registered on it. This might not be the case for you. I do not know how the TFL website behaves when you only have one card registered: feel free to comment and/or give some sample HTML if you want the script fixed to adapt to that case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment