Skip to content

Instantly share code, notes, and snippets.

@robdodson
Created June 20, 2012 07:06
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save robdodson/2958538 to your computer and use it in GitHub Desktop.
Save robdodson/2958538 to your computer and use it in GitHub Desktop.
Crawl a page full of links with Mechanize
require 'mechanize'
# Create a new instance of Mechanize and grab our page
agent = Mechanize.new
page = agent.get('http://robdodson.me/blog/archives/')
# Find all the links on the page that are contained within
# h1 tags.
post_links = page.links.find_all { |l| l.attributes.parent.name == 'h1' }
# Click on one of our post links and store the response
post = post_links[1].click
doc = post.parser # Same as Nokogiri::HTML(page.body)
p doc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment