Skip to content

Instantly share code, notes, and snippets.

@massdonati
Created May 9, 2013 10:19
Show Gist options
  • Save massdonati/5546724 to your computer and use it in GitHub Desktop.
Save massdonati/5546724 to your computer and use it in GitHub Desktop.
Small web crawler script using Anemone and MongoDB
require 'anemone'
require 'mongo'
# MongoDB setup
db = Mongo::Connection.new.db("demo")
urls_collection = db["page_urls"]
#New Anemone web crawler setup and main operation
Anemone.crawl("http://www.fondazionecollegiopiox.org") do |anemone|
anemone.storage = Anemone::Storage.MongoDB
anemone.on_every_page do |page|
puts page.code.to_s + ": " + page.url.to_s
if page.code && page.url
url = {code: page.code, url: page.url.to_s}
puts "Inserting #{url.inspect}"
urls_collection.insert url
end
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment