Skip to content

Instantly share code, notes, and snippets.

@Achillefs
Created June 2, 2011 13:21
Show Gist options
  • Save Achillefs/1004409 to your computer and use it in GitHub Desktop.
Save Achillefs/1004409 to your computer and use it in GitHub Desktop.
crawl_website.rb
%W[rubygems anemone].each {|r| require r}
site_root = "http://www.whatever.org/"
# Create the root folder
folder = URI.parse(site_root).host
FileUtils.mkdir_p(File.join(".",folder))
Anemone.crawl(site_root) do |anemone|
anemone.on_every_page do |page|
filename = page.url.request_uri.to_s
filename = "/index.html" if filename == "/" # Make sure the file name is valid
folders = filename.split("/")
filename = folders.pop
FileUtils.mkdir_p(File.join(".",folder,folders)) # Create the current subfolder
print "Downloading '#{page.url}'..."
File.open(File.join(".",folder,folders,filename),"w") {|f| f.write(page.body)}
puts "done."
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment