Skip to content

Instantly share code, notes, and snippets.

@debreczeni
Last active December 19, 2015 04:08
Show Gist options
  • Save debreczeni/5894728 to your computer and use it in GitHub Desktop.
Save debreczeni/5894728 to your computer and use it in GitHub Desktop.
search for 29er mountain bikes on apro.bikemag.hu
#!/usr/bin/env ruby
require 'mechanize'
# require 'awesome_print'
# require 'pry-debugger'
class TwentyNiner
def initialize
@index_page_url = 'http://apro.bikemag.hu/browse/mountain-bike/mtb-kerekpar/'
@wheel_size = /29/
@agent = Mechanize.new
@page_num = 1
fetch
end
def get_next_index_page_url_from index_page
as = index_page.search('.navigator.rs a')
return as.first['href'] if as && as.first && as.first['href']
end
def scrape index_page
puts "scraping page##{@page_num}"; @page_num += 1
index_page.search('#listings fieldset').each do |row|
begin
listing_url = row.search('tr td').first.search('a').first['href']
if (@agent.get listing_url).search('#df_field_kerekmeret td div.value').text.match(@wheel_size)
puts listing_url
end
rescue => error
# raise
end
end
rescue => error
# raise
end
def fetch
current_index_page = @agent.get @index_page_url
scrape current_index_page
while new_index_page_url = get_next_index_page_url_from(current_index_page)
scrape(current_index_page = @agent.get(new_index_page_url))
end
end
end
TwentyNiner.new
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment