Skip to content

Instantly share code, notes, and snippets.

@rbk
Created October 29, 2013 21:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rbk/7223122 to your computer and use it in GitHub Desktop.
Save rbk/7223122 to your computer and use it in GitHub Desktop.
Url finder from ruby class
# extract urls from a website
require "open-uri"
if ARGV.size > 0
url = ARGV[0]
open(url) do |data|
data.each_line do |line|
if result = line.match(/href="(.*?)"/)
if result[1] =~ /https?:\/\//
puts result[1]
else
puts "#{url}#{result[1]}"
end
end
end
end
else
puts "The URL of the site to scrape is required"
puts "usage: ruby scraper.rb http://testsite.com"
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment