Skip to content

Instantly share code, notes, and snippets.

@ericat
Created December 13, 2013 19:44
Show Gist options
  • Save ericat/7950102 to your computer and use it in GitHub Desktop.
Save ericat/7950102 to your computer and use it in GitHub Desktop.
Watir headless for Linux - headless watir gem is only available for linux.
require 'rubygems'
require 'watir'
require 'headless'
class Watir_Scraper
def initialize(url)
@headless = Headless.new
@headless.start
@b = Watir::Browser.new
@b.goto(url)
end
def has_next_link?
@b.link(:id => 'pnnext').exists?
end
def next_link
@b.link(:id => 'pnnext').click
end
def pages_indexed
@b.div(:xpath, "//div[@id='resultStats']").text
end
def scrape_href
@b.links.find_all(&:present?).map(&:href).reject {|link| link.match(/\.google/)}
end
end
page = Watir_Scraper.new('http://watirmelon.com/tag/headless/')
puts page.pages_indexed
all_links = page.scrape_href
@b.close
@headless.destroy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment