Skip to content

Instantly share code, notes, and snippets.

@CharlesFainLehman
Created September 25, 2010 02:36
Show Gist options
  • Save CharlesFainLehman/596393 to your computer and use it in GitHub Desktop.
Save CharlesFainLehman/596393 to your computer and use it in GitHub Desktop.
require 'net/http'
require 'open-uri'
links = []
if (ARGV[0] != nil)
links = [ARGV[0]]
else
links = ["http://en.wikipedia.org/wiki/Ruby_(programming_language)"]
end
i = 0
c = 1
begin
while (i < links.length)
link = links[i]
print "accessing #{link}...\n"
page = ''
open(link) do |h|
page = h.read
end
sites = page.scan(/<a href=\"(http:\/\/[\w|\W]*?)\">?[\w|\W]*?<\/a>/)
f = File.open("Crawled Sites.txt", 'a+')
for site in sites do
f.write("link #{c}: #{site[0]}\n") if links.include?(site[0]) == false
links << site[0] if links.include?(site[0]) == false
c += 1
end
f.close
i += 1
end
rescue Interrupt
print "exiting..."
exit
rescue Exception
print "o noez an error D: Dun't worry, it's been fixed!\n"
i += 1
retry
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment